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Preface 



ICGT 2004 was the 2nd International Conference on Graph Transformation, 
following the first one in Barcelona (2002), and a series of six international 
workshops on graph grammars with applications in computer science between 
1978 and 1998. ICGT 2004 was held in Rome (Italy), Sept. 29-Oct. 1, 2004 
under the auspices of the European Association for Theoretical Computer Sci- 
ence (EATCS), the European Association of Software Science and Technology 
(EASST), and the IFIP WG 1.3, Foundations of Systems Specification. 

The scope of the conference concerned graphical structures of various kinds 
(like graphs, diagrams, visual sentences and others) that are useful when de- 
scribing complex structures and systems in a direct and intuitive way. These 
structures are often augmented with formalisms that add to the static descrip- 
tion a further dimension, allowing for the modelling of the evolution of systems 
via all kinds of transformations of such graphical structures. The field of graph 
transformation is concerned with the theory, applications, and implementation 
issues of such formalisms. 

The theory is strongly related to areas such as graph theory and graph al- 
gorithms, formal language and parsing theory, the theory of concurrent and 
distributed systems, formal specification and verification, logic, and semantics. 
The application areas include all those fields of computer science, information 
processing, engineering, and the natural sciences where static and dynamic mod- 
elling using graphical structures and graph transformations, respectively, play 
important roles. In many of these areas tools based on graph transformation 
technology have been implemented and used. 

The proceedings of ICGT 2004 consist of two parts. The first part comprises 
the contributions of the invited talks followed by the carefully reviewed and 
accepted 26 papers that were selected out of 58 submissions. The topics of the 
papers range over a wide spectrum, including graph theory and graph algorithms, 
theoretic and semantic aspects, modelling, applications in chemistry and biology, 
and tool issues. The second part contains two tutorial introductions to graph 
transformation and their relation to software and DNA computing, and short 
presentations of the satellite events of ICGT 2004. 

We would like to thank the members of the program committee and the 
secondary reviewers for their enormous help in the selection process. We are 
also grateful to Reiko Heckel and Alexey Cherclrago for their technical support 
in running the conference system and in editing the proceedings. Moreover, we 
would like to express our gratitude to the local organizers Paolo Bottom (Chair), 
and Marta Simeoni who did a great job. Finally, we would like to acknowledge 
the always excellent cooperation with Springer, the publisher of the Lecture 
Notes in Computer Science. 
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Improving Flow in Software Development 
Through Graphical Representations* 



Margaret- Anne D. Storey 

University of Victoria, British Columbia, Canada 
mstorey@uvic . ca 



Abstract. Software development is a challenging and time intensive 
task that requires much tool support to enhance software comprehen- 
sion and collaborative work in software engineering. Many of the popular 
tools used in industry offer simple, yet highly effective, graphical aids to 
enhance programming tasks. In particular, tree views are frequently used 
to present features in the software and to facilitate navigation. General 
graph layouts, popular in many academic tools, are seen less frequently 
in industrial software development tools. Interactive graphs can allow a 
developer to visualize and manipulate non-structural relationships and 
abstractions in the software. In this presentation, I explore how graphical 
techniques developed in academia can improve “flow” for programmers 
using industrial development tools. The theory of “flow and optimal ex- 
periences” is used to offer rich explanations for the existence of many 
typical software tool features and to illuminate areas for potential im- 
provements from graphical tool support. 



* An extended version of this abstract is published in the IEEE proceedings of 
VL/HCC’04 (IEEE Symposium on Visual Languages and Human-Centric Comput- 
ing), Rome, Italy, September 26-29, 2004. 
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A Perspective on Graphs 
and Access Control Models 



Ravi Sandhu 

George Mason University and NSD Security 
ISE Department, MS4A4 
George Mason University 
Fairfax, VA 22030, USA 

sandhu@gmu . edu 
http : //www . list .gmu.edu 



Abstract. There would seem to be a natural connection between graphs 
and information security. This is particularly so in the arena of access 
control and authorization. Research on applying graph theory to access 
control problems goes back almost three decades. Nevertheless it is yet 
to make its way into the mainstream of access control research and prac- 
tice. Much of this prior research is based on first principles, although 
more recently there have been significant efforts to build upon existing 
graph theory results and approaches. This paper gives a perspective on 
some of the connections between graphs and their transformations and 
access control models, particularly with respect to the safety problem 
and dynamic role hierarchies. 



1 Introduction 

In concept there appears to be a strong potential for graphs and their transfor- 
mations to be applied to information security problems. In practice, however, 
this potential largely remains to be realized. Applications of graph theory in 
the security domain go back almost three decades and there has been a steady 
trickle of papers exploring this potential. Nonetheless graph theory has yet to 
make its way into the mainstream of security research and practice. In part this 
may be due to the relative youth of the security discipline and the particular 
focus of the research community in the early years. Because of the versatility of 
graph representations and graph theory techniques perhaps it is only a matter 
of time before a strong and compelling connection is found. 

Information security is a broad field and offers multiple avenues for applica- 
tion of graph theory. To pick just two examples, in recent years we have seen 
application of graph theory in penetration testing and vulnerability analysis [2, 
7, 17, 20, 29] and in authentication metrics [21]. It is beyond the scope of this pa- 
per to consider the vast landscape of information security. Rather we will focus 
on the specific area of access control and authorization. 

We begin with a brief review of access control and access control models, and 
then identify two specific problems of access control where graph theory has been 
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employed in the past. These are the so-called safety problem and the problem 
of dynamic hierarchies. The rest of the paper explores past work in these two 
problem areas in some detail and concludes with a brief discussion of possible 
future research. 



Access Control 

Access control is concerned with the question of who can do what in a computer 
system. Clearly the same object (such as a file) may be accessible by different 
users in different ways. Some users may be able to read and write the file, 
others to just read it and still others who have no access to the file. Strictly 
speaking users do not manipulate files directly but rather do so via programs 
(such as a text editor or word processor). A program executing on behalf of a 
user is called a subject, so access control is concerned with enforcing authorized 
access of subjects to objects. This basic idea was introduced by Lampson in a 
classic paper [14] and continues to be the central abstraction of access control. 
Authorization in Lampson’s access matrix model is determined by access rights 
(such as r for read and w for write) in the cells of an access matrix. An example 
of an access matrix is shown in figure 1. Here subject U can read and write file 
F but only read file G. Subject V can read and write file G but has no access to 
file F. A review of the essential concepts of access control is available in [25]. 



U 



V 



Fig. 1 . Example of an Access Matrix. 



F G 















r w 




r 




















r w 















The access matrix of figure 1 can be easily depicted as a directed graph with 
labelled edges as shown in figure 2. Thereby the intuitive feeling that there is a 
strong connection between graphs and access control. For convenience, we will 
henceforth talk of the access matrix and access graph as equivalent notions. 



Access Control Models 

A static access graph is not very interesting. Real computer systems are highly 
dynamic in that the access rights of subjects to objects change over time and 
new subjects and objects (and thereby new rights) are created and existing ones 
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U F 




Fig. 2. Example of an Access Graph. 

U F 




Fig. 3. Owner-Based Discretionary Access Control. 



deleted. In terms of the access matrix this means that not only can contents of 
existing cells be changed but new rows and columns can be created and existing 
ones destroyed. In terms of the access graph, in addition to edge adding and 
deleting operations new nodes can be created and existing ones deleted. 

An access control model specifies the operations by which the access graph 
can be changed. These operations are typically authorized by existing rights in 
the access graph itself. A common example of this is the “own” right shown in 
figure 3. The owner of a file has the own right for it and can add and delete 
rights for that file at the owner’s free discretion. Thus subjects U and V control 
the rights of all subjects to files F and G respectively, i.e. , U and V control the 
addition and deletion of edges labelled r or w terminating in F and G respectively. 

The policy of owner-based discretionary access control is certainly reasonable 
but researchers quickly realized that there are many other policies of practical 
interest. For example, can the “own” right itself be granted? Some systems do 
not allow this. The creator of a file becomes its owner and remains its owner 
thereafter. Other systems allow ownership to be propagated from one subject to 
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another. Some of these allow multiple simultaneous ownership, while other allow 
only one owner at a time. There does not appear to be any single policy that 
will be universally applicable. Hence the need for flexible access control models 
in this regard. 

The Safety Problem 

In a seminal paper Harrison, Ruzzo and Ullman [8] proposed a simple language 
for stating the rules by which an access graph can be changed. The resulting 
model is often called HR.U. They then posed the safety problem as follows 1 . 

Given an initial access graph and a fixed set of rules for making au- 
thorized changes to it, is it possible to reach an access graph in which 
subject X has a right to object Y (i.e., there is an edge labelled a from 
X to Y)? 

It turns out that safety is undecidable in the HR.U model. Surprisingly the quest 
to find useful models with efficiently decidable safety proved to be quite chal- 
lenging. Although significant positive results have appeared over the years, a 
appropriate balance between safety and flexibility remains a challenge for access 
control models. The role of graph theory in progress on the safety problem is 
discussed in section 2. 

Dynamic Hierarchies 

Most practical access control systems go beyond the simple access graph we have 
discussed to provide a role (or group) construct. Thus subjects not only get the 
rights that they individually are granted but also acquire rights granted to roles 
(or groups) that they are a member of. For example, figure 4 shows an access 
graph in which subject U is a member of role G which in turn has the rights r 
and w for file F. Thereby U is authorized to read and write file F 2 . 

Roles are a powerful concept in aggregating permissions and simplifying their 
administration [5, 26]. Modern access control systems are typically role-based be- 
cause of the power and flexibility of this approach. Roles are often organized into 
hierarchies as shown, for example, in figure 5. This is a Hasse diagram of a par- 
tial order where senior roles are shown towards the top and junior ones towards 
the bottom. The Supervising Engineer role inherits all permissions of its junior 
roles, thus this role can do everything that the junior roles can do plus more. 
Conversely, a user who is a member of a senior role is also considered to be a 
member of the junior roles. In other words permissions are inherited upwards 
in the hierarchy and membership is inherited downwards. In practice role hi- 
erarchies need to change and evolve over time. How to do this effectively is a 
challenging problem for role administration. The application of graph transfor- 
mations in progress on this issue is discussed in section 3. 

1 The original HRU formulation is in terms of the access matrix but is easily restated 
as done here in terms of the access graph. 

2 This can be shown in the access graph by a “temporary” edge labelled r, w directed 
from U to F. 
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U F 




Supervising 

Engineer 




Engineer 



Fig. 5. An Example of Hierarchical Roles. 



2 Graphs and the Safety Problem 

The take-grant model of Lipton and Snyder [15] is among the earliest applications 
of graph theory to access control. This model was developed in reaction to the 
undecidable safety results of the HRU model. It takes its name from the two 
rights it introduces, t for take and g for grant. The depiction of these two rights 
in the access graph is shown in figure 6. The notation B/tedom(A) denotes the 
possession of the B/t capability in the A’s domain, and is equivalent to stating 
that te[A,B] cell of the access matrix. Similarly for B/gedom(A). The take right 
in figure 6(a) enables any right that B has to be copied to A. That is any edge 
originating at B can be duplicated with the same label and termination node but 
originating at A. The grant right in figure 6(b) conversely enables any right that 
A has to be copied to B. That is any edge originating at A can be duplicated 
with the same label and termination node but originating at B. 

Somewhat surprisingly it turns out that the flow of rights in the take-grant 
model is symmetric. This allows for efficient safety analysis in the model but 
severely limits its expressive power. The original formulation of the take-grant 
model depicted the take and grant rights in the access graph as shown in figure 6. 
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t 




(a) B/t C dom(A) 

A B 




(b) B/g 0 dom(A) 

Fig. 6. Transport of Rights in the Take-Grant Model: The Original Access Graph View. 



Lockman and Minsky [16] observed that a slightly different graph representation 
would demonstrate the symmetry of take-grant much more easily. They proposed 
to represent the ability for rights to flow from A to B by a directed edge from 
A to B labelled can-flow. The two situations of figure 6 are respectively shown 
in figure 7 in this modified representation. The focus of this representation is on 
the flow of rights rather than on the underlying right that enables the flow. 




(a) B/t 0 dom(A) 




(b) B/g 0 dom(A) 

Fig. 7. Transport of Rights in the Take-Grant Model: The Modified Can-Flow View. 



To complete description of the take-grant model we show the create operation 
shown in figure 8 using both styles of representation 3 . This diagram shows the 

3 The take-grant model also includes revoke and destroy operations. We omit their 
definition since they are not relevant here. 










Ravi Sandhu 



A 





t g 



can-flow 



can-flow 



A’ 





(a) The Original View (b) The Modified View 

Fig. 8. Creation in the Take-Grant Model. 




Fig. 9. Reversal of Flow in the Take-Grant Model. 

result of A creating a new subject A'. A gets the A'/t and A'/g rights thus 
enabling can-flow in both directions. 

With the modified can-flow representation symmetry of flow of rights in the 
take-grant model is easily demonstrated in figure 9. Figure 9(a) shows the initial 
situation with can-flow from A to B. This flow can be authorized by B/ggdom(A) 
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or by A/t€dom(B). The specific authorization is not material. The first step is 
for A to create A' as shown in figure 9(b) with the resulting can- flow edges 
from A to A' and vice versa. These edges are authorized by A , /t€dom(A) and 
A'/g€dom(A). In particular, A'/g can be moved from dom(A) to dom(B) by 
virtue of the can- flow from A to B. This gives us the situation with can- flow 
from B to A' shown in figure 9(c). In conjunction with the can- flow from A' to 
A there is now a flow from B to A, thus reversing the original flow from A to B 4 . 

I came across these constructions during my doctoral research. I believe they 
demonstrate a fundamental truth. Graph representations are flexible and it is 
important to capture the correct properties in the edges and nodes of the graph. 
My subsequent work on the safety problem resulted in a number of models [1, 22, 
24] . Some elements of graph theory are used in the analysis of these models but 
there is a probably a stronger tie with existing graph theory results. So there is 
potential for exploring a deeper connection here. Based on the discussion above 
much will depend upon a suitable representation of the access control problem 
in graph edges and nodes. Jaeger and Tidswell [6] have used graph notation to 
capture constraints and argue that this approach may lead to practical safety 
results. Nyanchama and Osborn have also discussed the representation of con- 
flict of interest policies in their role-graplr model [18]. Also recently Koch et 
al [9-12] have developed safety results directly based on the theory of graph 
transformations. Reconciliation of these results with the known safety results 
for access control models would be a step forward in understanding the insights 
that graphs and their transformations can provide in this domain. 

3 Graphs and Dynamic Hierarchies 

The use of role (or group) hierarchies in access control has a long history and 
considerable motivation for simplifying management of rights. The current com- 
mercial success of role-based access control products that support hierarchies is 
testimony to this fact. Mathematically a hierarchy is a partial order, that is a re- 
flexive, transitive and anti-symmetric binary relation on roles. In this section we 
briefly look at two lines of research dealing with dynamic hierarchies for access 
control. 

A particular kind of hierarchy called an ntree was introduced by this au- 
thor [23]. The ntree has some very appealing properties for access control in- 
cluding the fact that it is a dimension 2 partial order, so it can be represented 
as the intersection of two linear orders. This allows us to label each node n with 
a pair of natural numbers l(n) and r(n), such that u<v if and only if l(u)<l(v) 
and r(u)<r(v). The ntree also has a recursive definition based on refining an 
existing node into another ntree, with the base case being a forest of trees and 
inverted trees. There are efficient algorithms for recognizing whether or not a 
given hierarchy is an ntree. One of the open questions regarding ntrees is how 

4 Lockman and Minsky [16] went on to consider the grant-only and take-only models 
with the former having the symmetric flow property of take-grant but the latter 
allowing asymmetric flow. 
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to recognize hierarchies that are close to being ntrees and could benefit from a 
ntree representation augmented with some additional information. More gener- 
ally research on hierarchies with special properties that are attractive for access 
control can be pursued. 

The need for dynamic role hierarchies as part of the administration of role- 
based access control (RBAC) is motivated in the ARBAC97 model [27]. The 
fundamental question is how to allow localized evolution of a hierarchy without 
disrupting larger global relationships. The authors of ARBAC97 suggest the no- 
tion of an encapsulated range as the basis for determining a suitable unit for 
local modifications. Crampton and Loizou [3, 4] point out some problems with 
this notion and propose a mathematically better founded notion of adminis- 
trative scope. Koch et al [13] discuss administrative scope in their graph-based 
approach to access control and provide an operational semantics for it. The issue 
of evolving hierarchies and reconciling pre-existing hierarchies is likely to grow 
in importance as enterprises deploy RBAC across multiple business units and 
business partners. 

4 Conclusion 

In this paper we have briefly explored the connection between graphs and access 
control models, focusing on the safety problem and on dynamic role hierarchies. 
There is a long history of attempts to apply graph theory to these problems. 
Much of the earlier work is based on first principles. In recent years we have 
seen a more direct application of graph theory results. There is strong potential 
in further exploring this connection. 

The area of access control and authorization has had a resurgence of interest 
in recent years. Although the access matrix model has served as a reasonable 
foundation for access control research and practice it has become considerably 
dated. With the Internet explosion many new forms of access control are being 
deployed in various e-commerce scenarios. There is increasing realization that 
the foundations of access control need a deeper and richer model. A number 
of authors have proposed various extensions to traditional access control. Park 
and Sandhu [19, 28] recently proposed a unified model for next generation access 
control called usage control or UCON. Initial efforts to formalize this model have 
taken a logic-based approach [30]. It would be interesting to see how graphs and 
their transformations can be applied to UCON models. The framework of UCON 
is very rich so there is likely to be some aspect of UCON that can benefit from 
a graph-based formal foundation. 
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1 Introduction 

With the advent of the Model-Driven Architecture (MDA) [3] there is signifi- 
cant interest in the development and application of transformation languages. 
MDA recognises that systems typically consist of multiple models (possibly ex- 
pressed in different modelling languages and at different levels of abstraction) 
that are precisely related. The relationship between these different models can 
be described by transformations (or mappings). 

An important emerging standard for transformations is QVT (Queries, Views, 
Transformations). This standard, being developed by the Object Management 
Group (OMG), aims to provide a language for expressing transformations be- 
tween models that are instances of the MOF (Meta Object Facility) [2] meta- 
model. The MOF is a standard language for expressing meta-data that is be- 
ing used as the foundation for expressing language metamodels (models of lan- 
guages). Because of the generic nature of MOF, it is also being used as the means 
of expressing the QVT language itself. 

When the QVT process began (over two years ago), the task initially seemed 
quite straightforward. After all, many different transformation languages were 
already described in the literature, and it was felt that it would be straight- 
forward to design such a language for MOF. Unfortunately, this has not been 
the case. Two key issues have made the task of designing such a language much 
harder. In the remainder of this paper we will examine these issues and propose 
a solution that is applicable across a wide variety of language definitions. 

2 Design Issues 

The first issue that impacts the design of a transformation language relates to 
transformation languages themselves. In practise, it turns out that there are 
many different flavours of transformation languages. Some of the choices of lan- 
guage features include: 

— Declarative vs. Imperative : at what level of abstraction should transforma- 
tions be expressed? Declarative languages enable transformations to be ex- 
pressed in a more concise fashion, yet may suffer from being inefficient (or 
impossible) to implement. 
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— Compositionality: whilst a transformation language should ideally be com- 
positional, this is more readily achieved by the use of more declarative prim- 
itives, thus invoking the declarative vs imperative issue (above). 

— Patterns vs. Actions: patterns are widely used as a declarative but exe- 
cutable abstraction for describing transformations (XSLT is a good example 
of this). Yet, should a transformation language be completely pattern based, 
or should a mixed language with imperative actions permitted for practical- 
ity? 

— Unidirectional vs. Bi-directional: it is clear that there is a strong distinction 
between one-stop, unidirectional transformations, and bi-directional map- 
pings that keep two models in sync. Should both be accommodated? 

These, and many other choices make the decision process a difficult one for 
the designers of transformation languages. One approach to tackling the problem 
is to attempt to mudpack all of the different features into a single language. 
However, there is clearly a danger of producing an overly complex language. On 
the other hand, choosing a subset of the features will clearly omit use cases of 
the language that may be relevant to users of the language. 

The second issue relates to MOF itself. During the QVT process, it has 
become more apparent than ever that current metamodelling practice is too 
weak. In particular, the standard approach to metamodelling, in which the main 
focus is on capturing the static properties of a language (i.e. the abstract syntax) 
does not enable two critical aspects of language design to be expressed: semantics 
(what the language does and means) and concrete syntax (how the language is 
represented). Thus, in order to describe these aspects, the design team must rely 
on informal textual descriptions or bespoke implementations. In the latter case, 
this often results in ‘analysis paralysis’, as there is insufficient information to 
validate the correctness of the design. 

Clearly, in the context of an international standardisation process, this is not 
satisfactory. In particular, it will be difficult to ensure that implementations of 
the standard are conformant as there will be gaps in the definition that will be 
filled in by vendors in different ways. 

3 The Way Forward 

In order to fully address the needs of QVT and transformation language design 
in general, it is clear to us that two key changes are required. Firstly, the bar 
must be raised in the way in which we metamodel languages. Rather than just 
capturing abstract syntax, the metamodelling language must be rich enough to 
capture all aspects of a language, including concrete syntax, abstract syntax 
and semantics. This information should be sufficient to rapidly generate tools 
that implement the language and allow its properties to be fully explored and 
validated. 

Secondly, it must be recognised that there is no single, all encompassing 
transformation language. Instead we must be prepared to embrace a diversity of 
languages, each with specific features. Furthermore, the standard must have the 
flexibility to accommodate this diversity in an interoperable manner, enabling 
different features to be mixed and matched as required. 
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At first, these proposals appear to be unconnected. Yet, they are in fact 
closely related. In practice, we have found that the richer the capability for 
expressing metamodels, the greater the flexibility and interoperability of the 
resulting language designs. This occurs because the metamodels capture cohesive 
language units that can be readily integrated within other languages. 

4 XMF 

We have constructed an approach (and associated tools) for language metamod- 
elling that aims to realise these goals. This approach is based on what we call 
an executable Metamodelling Framework (XMF). The basic philosophy behind 
XMF is that many different languages can be fully described via a metamodelling 
architecture that supports the following: 

— A platform independent virtual machine for executing metamodels. 

— A small, precise, executable metamodelling language that is bootstrapped 
independently of any implementation technology. This supports a generic 
parsing and diagramming language, a compiler and interpreter, and a collec- 
tion of core executable MOF modelling primitives called XOCL (executable 
OCL). 

— A layered language definition architecture, in which increasingly richer lan- 
guages and development technologies are defined in terms of more primitive 
languages via operational definitions of their semantics or via compilation 
to more primitive concepts. 

— Support for the rapid deployment of metamodels into working tools. This 
involves linking executing metamodels with appropriate user-interface tech- 
nology. 

Using this architecture we have implemented many different modelling lan- 
guages and development technologies for industrial clients. We have used exactly 
the same approach in the definition of transformation languages. Firstly, some 
core transformation language abstractions were implemented. These included a 
pattern matching language and synchronisation language. Two transformation 
languages were then defined on top of these. The first, XMap, provides a language 
for generative transformations based on pattern matching. XOCL is integrated 
in the language, thus enabling mixed declarative and imperative mappings. The 
second, XSync, supports the dynamic, bi-directional synchronisation of models, 
this time using XOCL as a means of writing the synchronisation rules. 

In the following sections, we firstly give an example of one the languages, 
XMap, and then describe how the language is defined using a metamodel. 

5 XMap Example 

The example defines a mapping between two models: a simple model of state 
machines, and a simple model of C++. The simple state machines model is 
shown below: 
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Both states and transitions are labelled with a name. A transition relates a 
source state to a target state. 

The following model captures the basic features of C++. Note that the body 
of an operation is a string. However, if necessary the expressions and statements 
could also be modelled. 




A mapping from the StateMachine model to the CH — I- model can now be 
written. It maps a StateMachine to a C++ class, where each state in the state 
machine is mapped to a value in an enumerated type called STATE. Each tran- 
sition in the state machine is mapped to a C++ operation with the same name 
and a body, which changes the state attribute to the target of the transition. 

The mapping can be modelled in XMap as shown below. The arrows represent 
mappings between elements of the two languages. The first mapping, SM2Class, 
maps a state machine to a C++ class. The second mapping, Transition20p, 
maps a transition to an operation. 
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StateMachine 

(from Root: :StateMachinesExar =: _- 

domain 





Transition 

(from Root: :StateMachinesEx< — — 
domain 



V ’ 

T ransitions20p^^^ 


CPPOp 

(from Root: :StateMachinesExar 




range 





In order to describe the details of the mapping, XMap uses a textual map- 
ping language based on pattern matching. As an example, the definition of the 
mapping between a transition and an operation is as follows: 
context Transition20p 
@Clause Transition20p 
Transition 
[name = N, 
target = T] 
do 

CPPOp 

[name = N, 
body = B] 
where 

B = ’’state = ” + T . name 

end 

A mapping consists of a collection of clauses, which are pattern matches 
between source and target objects. Whenever a source object is successfully 
matched to the input of the mapping, the resulting object in the do expression 
is generated. Variables can be used within clauses, and matched against values 
of slots in objects. Because XMap builds on XOCL, XOCL expressions can be 
used to capture complex relationships between variables. 

In this example, whenever the mapping is given a Transition with a name 
equal to the variable N and a target equal to T, it will generate an instance of 
the class Operation, whose name is equal to N and whose body is equal to the 
variable B. The where clause is used to define values of variables, and it is used 
here to define the variable B to be concatenation of the text “state = ” with the 
target state name. For instance, given a transition between the states “On” and 
“Off” , the resulting operation will have the body “state = Off” . 

6 XMap Metamodel 

The architecture of the XMap language metamodel is described in the figure 
below. 
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The syntax metamodel describes how the metamodel captures concrete syn- 
tax representation of the mapping language. There are three ways this can be 
achieved: 

— A textual syntax can be defined for the language by constructing a grammar 
that states how a textual representation of models written in XMap can be 
parsed into an instance of the concepts represented in the syntax definition. 
This is achieved in XMF using XBNF : an extended BNF grammar language 
that provides information about how to turn textual elements into instance 
of XMF elements. 

— A graphical syntax can be defined by defining a mapping from a model of 
the graphical syntax of the language to concepts in the syntax domain. 

— A mixed approach can be used, in which both graphical and textual elements 
are defined. This is the approach taken with the XMap language. 



6.1 Textual Syntax Metamodel 

As an example a textual syntax metamodel, the following fragment of XBNF 
defines the rules for parsing a clause into an instance of a Clause class: 

@Class Clause 

©Grammar extends OCL :: OCL. grammar 
Clause ::= name = Name 

patterns = ClauseP atterns ‘do’ body = Exp { 

Clause (name , patterns , body ) } . 
ClausePatterns ::= p = Pattern 

ps = (’,’ Pattern)* { Seqjp | ps } } . 
ClauseBindings :: ‘where’ Bindings j {Seq{}}. 

end 

end 

The grammar extends the OCL grammar with the concept of a Clause, where 
a Clause is defined to be a name, followed by a collection of patterns, followed 
by a ‘do’ and a body, which can be an expression, and an optional collection of 
‘where’ bindings. The result of matching any textual input of this form: 
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©Clause <name> 

<patterns > 
do 

<body> 

where 

<bindings > 

end 

is to create an instance of a ClauseQ, passing it the definitions of name, patterns 
and bindings. Of course, further XBNF definitions will be needed that define the 
grammar rules for Pattern, Binding, etc. These are omitted for brevity. 

6.2 Diagram Syntax Metamodel 

As an example of a metamodel of diagrammatical syntax, the following diagram 
describes a part of the syntax of XMap mapping diagrams. A mapping diagram 
extends a class diagram with MapNodes (uni-directional arrows) . 




The relationship between a MapNode and a mapping (denoted here by the 
class Map) is kept constantly in step via the MapXNode mapping. This body of 
this mapping is written in XSync, thus synchronising the relevant aspect of the 
two elements. For instance, the name of the Map and the MapNode must always 
be kept in step. 

6.3 Semantic Domain 

The syntax of the language can be viewed as a syntactical sugar for concepts in 
the semantic domain. A semantic domain expresses the meaning of the concepts 
in terms of more primitive, but well-defined concepts. A semantics is thus defined 
for XMap via a translation from the syntactical representation of a mapping into 
a semantic domain model. 

The semantic domain model for the XMap language is described by the 
following model: 

Here, a Map denotes a mapping. It is a subclass of the class Class and there- 
fore inherits all the properties of the class Class. It can therefore be instantiated 
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and define operations that can be executed. In addition, a Map has a domain 
and range, and a sequence of clauses. 

The translation step that is performed is to translate each clause of a mapping 
into a case statement belonging to a distinguished operation of the class Map. 
Each case statement can contain an XOCL expression. Note that in this case, 
XOCL has also been extended with patterns, thus enabling values to be matched 
against other values. 

As an example, consider a mapping with the following clauses: 

©Clause ClauseO 
T r ansi t ion 
[name = ””] 
do 

CPPOp 

[ name = ” Empty ” ] 

end 

@Clause Clausel 
T r ansi t ion 
[name = N] 
do 

CPPOp 

[name = N] 

end 

This would be translated into the following operation of the class Map: 

©Operation invoke (): Element 
@Case of 
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Transition [name = N] do 
CPPOp[name = N] 
end 

Transition [name = [ ”” ]] do 
CPPOp [ name = ” Empty ” ] 
end 

else self . error (” Mapping failed.”) 
end 
end 

Running this operation will thus execute the appropriate case statements 
and perform the mapping. 

The way in which the translation from syntax to semantic domain occurs is 
a matter of choice. At the diagram level, the desugaring is maintained by the 
synchronised mapping between the diagram and the semantic domain model. At 
the syntax level, a desugar() operation can be added to the grammar to tell it 
how to construct the appropriate case statement. 

7 Conclusion 

Our approach to defining transformation languages is to use rich metamodels 
to capture all aspects of their definition, including syntax and semantics. A 
key property is that definitions of existing languages and technologies (such as 
XOCL) can be merged in with the new language, creating richer, more expressive 
capabilities. 

The result is a precise definition that is: platform independent (no reliance on 
external technology), transparent (the entire definition, including its semantics 
can be traced back through the metamodel architecture); extensible and inter- 
operable (new features can be added by adding new language components), and 
executable (enabling the language to be tested and validated). 

There has been much recent interest in the design of domain specific lan- 
guages [1], and the approach described in this paper offers a scalable solution 
to the problem of how to generate new languages and tools that support those 
languages in a generic fashion. 

In summary, our position is that a crucial step in the design of transforma- 
tions languages must be the adoption of more complete and semantically rich 
approaches to metamodelling. 
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Abstract. Development processes in engineering disciplines are inherently com- 
plex. Throughout the development process, different kinds of inter-dependent de- 
sign documents are created which have to be kept consistent with each other. 
Graph transformations are well suited for modeling the operations provided for 
maintaining inter-document consistency. In this paper, we describe a novel ap- 
proach to rule execution for graph-based integration tools operating interactively 
and incrementally. Rather than executing a rule in atomic way, we break rule ex- 
ecution up into multiple phases. In this way, the user of an integration tool may 
be informed about all potential rule applications and their mutual conflicts so that 
he may take a judicious decision how to proceed. 



1 Introduction 

Development processes in engineering disciplines are inherently complex. Through- 
out the development process, different kinds of inter-dependent documents are created 
which have to be kept consistent with each other. For example, in software engineer- 
ing there are requirements definitions, software architectures, module bodies, etc. which 
describe a software system from different perspectives and at different levels of abstrac- 
tion and granularity. Documents are connected by manifold dependencies and need to 
be kept consistent with each other. For example, the source code of a software system 
must match its high-level description in the software architecture. 

Development processes may be viewed as multi-stage transformation processes 
from the initial problem statement to the final solution. Throughout the transforma- 
tion process, many interacting decisions have to be performed. These decisions can 
be automated only to a limited extent; in many settings, human interactions are re- 
quired. Moreover, transformation rarely proceeds stage-wise according to some wa- 
terfall model. Rather, incremental and iterative processes have been proposed, which 
require to propagate changes throughout a set of inter-dependent documents. 

In such a setting, there is a need for incremental and interactive integration tools for 
supporting inter-document consistency maintenance. An integration tool has to manage 
links between parts of inter-dependent documents. These parts are called increments in 
the sequel. The tool assists the user in browsing (traversing the links in order to nav- 
igate between related increments in different documents), consistency analysis (con- 
cerning the relationships between the documents’ contents), and transformations (of 
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the increments contained in one document into corresponding increments of the related 
document). 

Graphs and graph transformations have been used successfully for the specification 
and realization of integration tools [1,2]. However, in the case of incremental and inter- 
active integration tools specific requirements have to be met concerning the execution of 
integration rules. In this paper, we describe a novel approach to rule execution for graph- 
based integration tools operating incrementally and interactively. We have realized this 
approach, which is based on triple graph grammars [3, 4], in a research prototype called 
IREEN, an Integration Rule Evaluation SVvinronment [5], Rather than executing a rule 
in atomic way, IREEN breaks rule execution up into multiple phases. In this way, the 
user of an integration tool may be informed about all potential rule applications and 
their mutual conflicts so that (s)he may take a judicious decision how to proceed. 

The rest of this paper is structured as follows: Section 2 presents a scenario which 
motivates our work by a practical example. Section 3 is devoted to the graph-based spec- 
ification of integration tools. Section 4, the core part of this paper, presents our novel 
approach to rule execution. Section 5 discusses related work, and Section 6 presents a 
short conclusion. 



2 Scenario 

The research reported in this paper is carried out within the IMPROVE project [6], 
which is concerned with models and tools for design processes in chemical engineering. 
In this section, we present a small example which illustrates key features of incremental 
and interactive integration tools. This example is drawn from chemical engineering, 
but we could also have chosen an example from another engineering discipline (e.g., 
software engineering). 

In chemical engineering, the flow sheet acts as a central document for describing 
the chemical process. The flow sheet is refined iteratively so that it eventually describes 
the chemical plant to be built. Simulations are performed in order to evaluate design 
alternatives. Simulation results are fed back to the flow sheet designer, who annotates 
the flow sheet with flow rates, temperatures, pressures, etc. Thus, information is prop- 
agated back and forth between flow sheets and simulation models. Although the flow 
sheet plays the role of a master document, it may also happen that a simulation model 
is created first and the flow sheet is derived from the simulation model (reverse engi- 
neering). 

Unfortunately, the relationships between flow sheets and simulation models are not 
always straightforward. Different kinds of simulation models are created for differ- 
ent purposes. Often, simulation models have to be composed from pre-defined blocks 
which in general need not correspond 1 : 1 to structural elements of the flow sheet. Thus, 
maintaining consistency between flow sheets and simulation models is a demanding 
task requiring sophisticated tool support. 

Figure 1 illustrates how an incremental integration tool assists in maintaining con- 
sistency between flow sheets and simulation models. In general, flow sheets and simu- 
lation models are created by different users at different times with the help of respective 
tools; an integration tool is used to establish mutual consistency on demand. In a cooper- 
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Fig. 1 . Integration between flow sheet and simulation model. 



ation with an industrial partner, we studied the coupling of COMOS [7], an environment 
for chemical engineering which in particular offers a flow sheet editor, and Aspen Plus 
[8], an environment for performing steady-state and dynamic simulations. 

The chemical process taken as example produces ethanol from ethen and water. 
Flow sheet and simulation model are shown above and below the dashed line, respec- 
tively. The integration document for connecting them contains links which are drawn 
on the dashed line. The figure illustrates a design process consisting of four steps: 

1 . The simulation expert has already created a simulation model for a part of the chem- 
ical process (heating and reaction). The simulation model is composed of three 
blocks according to the capabilities of the respective simulation tool. 

2. The simulation model is transformed into a flow sheet. This is achieved with the 
help of an integration tool. Multiple alternatives are available for this transforma- 
tion. It turns out that the simplest one — a 1 : 1 transformation — does not result 
in an adequate flow sheet because the blocks do not correspond 1 : 1 to devices in 
the flow sheet. Rather, the user decides to group two blocks and their connecting 
stream into a single device (a plug flow reactor) in the flow sheet. The link between 
the PFR and the respective parts of the simulation model is established by firing 
a corresponding integration rule. In addition, another rule is available which just 
transforms the block called RPlug into a PFR. This 1:1 rule stands in conflict with 
the rule selected here. The integration tool presents conflicting rules to the user who 
may select the rule to be applied. 

3. Steps 3a and 3b are carried out in parallel, using different tools. Using the simu- 
lation model created so far, a simulation is performed in the simulation tool. The 
simulation results comprise flow rates, temperatures, etc. In parallel, a flow sheet 
editor is used to extend the flow sheet with the chemical process steps that have not 
been specified so far (flashing and splitting). 

4. Finally, the integration tool is used to synchronize the parallel work performed in 
the previous step. This involves information flow in both directions. First, the at- 
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tributes containing the simulation results are propagated from the simulation model 
back to the flow sheet 1 . Second, the extensions are propagated from the flow sheet 
to the simulation model. After these propagations have been performed, mutual 
consistency is re-established. 

From this example, we may derive several features of the kinds of integration tools 
that we are addressing. Concerning the mode of operation, our focus lies on incremental 
integration tools rather than on tools which operate in a batch-wise fashion. Rather than 
transforming documents as a whole, incremental changes are propagated — in general 
in both directions — between inter-dependent documents. Often, the integration tool 
cannot operate automatically; rather, the user has to perform decisions interactively. In 
general, the user also maintains control on the time of activation, i.e., the integration 
tool is invoked to re-establish consistency whenever appropriate. Finally, it should be 
noted that integration tools do not merely support transformations. In addition, they 
are used for analyzing inter-document consistency or browsing along the links between 
inter-dependent documents. 

3 Graph-Based Specification of Integration Tools 

In complex scenarios as described in the previous section, an integration tool needs to 
maintain a data structure storing links between inter-dependent documents. This data 
structure has been called integration document. Altogether, there are three documents 
involved: the source document, the target document, and the integration document. 
Please note that the terms “source” and “target” denote distinct ends of the integra- 
tion relationship between the documents, but it does not necessarily imply a unique 
direction of transformation (in fact, transformations are performed in both directions in 
our sample scenario). 

All involved documents may be modeled as graphs, which are called source graph, 
target graph, and correspondence graph, respectively 2 . Moreover, the operations per- 
formed by the respective tools may be modeled by graph transformations. Triple graph 
grammars [3] were developed for the high-level specification of graph-based integra- 
tion tools. The core idea behind triple graph grammars is to specify the relationships 
between source, target, and correspondence graphs by triple rules. A triple rule defines 
a coupling of three rules operating on source, target, and correspondence graph, respec- 
tively. By applying triple rules, we may modify coupled graphs synchronously, taking 
their mutual relationships into account. 

An example of a triple rule is given in Figure 2 in PROGRES [10] syntax. The rule 
refers to the running example to be used throughout the rest of this paper, namely the 
creation of connections (appearing in both flow sheets and simulation models). In a flow 
sheet, a connection is used to relate structural elements such as devices and streams. An 
example of a device is a reactor, a stream is used to represent the flow of chemical 
substances between devices. In Figure 1, devices are represented as rectangles, streams 

1 For a description of the attribute propagation mechanism please refer to [9]. 

2 If the tools operating on source and target document are not graph-based, the integration tool 
requires wrappers which establish corresponding graph views. 
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transformation ConnectionSynchroneous * = 



flow sheet (source) correspondence simulation model (target) 

toComos Increment toAspenlncrement 

* 2 : ComosOutPort | -«g; 1 ' 1 : snbLink | ' 3 : AspenOutPort 



toComosIncrement toAspenlncrement 

' 4 : ComosInPort Us ' 6 : subLink ' 5 : AspenlnPort 



flow sheet (source) correspondence simulation model (target) 




Fig. 2. Triple rule for a connection. 



are shown as directed lines. Connections are not represented explicitly (rather, they may 
be derived from the layout), but they are part of the internal data model. Each device or 
stream has a set of ports; connections establish relationships between these ports. 

The triple rule ConnectionSynchronous has a left-hand side (shown above 
the right-hand side) which spans all participating subgraphs: the source graph (repre- 
senting the flow sheet) on the left, the correspondence graph in the middle, and the target 
graph (for the simulation model) on the right. The left-hand side is composed of port 
nodes in source and target graph, distinguishing between output ports and input ports 3 . 
Furthermore, it is required that the port nodes in both graphs correspond to each other. 
This requirement is expressed by the nodes of type subLink in the correspondence 
graph and their outgoing edges which point to nodes of the source and target graph, 
respectively. Port correspondences are established by other triple rules which transform 
the blocks the ports belong to, e.g. streams or devices. Correspondences between source 
and target patterns are represented by links and can be further structured by sublinks, 
e.g. to express port correspondences. 

All elements of the left-hand side re-appear on the right-hand side. New nodes are 
created for the connections in source and target graph, respectively, as well as for the 
link between them in the correspondence graph. The connection nodes are embedded 
locally by edges to the respective port nodes. For the link node, three types of adjacent 
edges are distinguished. toDominant edges are used to connect the link to exactly one 
dominant increment in the source and target graph, respectively. In general, the source 
and target pattern related through the triple rule may consist of more than one increment 
in each participating graph. Then, there are additional edges to normal increments (not 



3 Only ports of different orientation may be connected. 
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needed in our running example) 4 . Finally, toContext edges point to nodes which are 
not themselves part of the transformation but are required as a context condition. These 
nodes are called context increments. 

Figure 2 describes a synchronous graph transformation. As already explained ear- 
lier, we cannot assume in general that all participating documents may be modified 
synchronously. In case of asynchronous modifications, the triple rule shown above is 
not ready for use. However, we may derive asynchronous rules from the synchronous 
rule in the following ways: 

- A forward rule assumes that the source graph has been extended, and extends the 
correspondence graph and the target graph accordingly. Thus, the forward rule de- 
rived from our sample rule would contain node 7 on the left-hand side. 

- Analogously, a backward rule is used to describe a transformation in the reverse 
direction. In our example, node 9 would be part of the left-hand side. 

- Finally, a consistency analysis rule is used when both documents have been mod- 
ified in parallel. In our running example, this means that connections have been 
inserted into both the flow sheet and the simulation model and a link is created a 
posteriori. Thus, the consistency analysis rule would include nodes 7 and 9 on the 
left-hand side. 

Unfortunately, even these rules are not ready for use in an integration tool as de- 
scribed in the previous section. In the case of non-deterministic transformations be- 
tween inter-dependent documents, it is crucial that the user is made aware of conflicts 
between applicable rules. Thus, we have to consider all applicable rules and their mu- 
tual conflicts before selecting a rule for execution. To achieve this, we have to give up 
atomic rule execution , i.e., we have to decouple pattern matching from graph transfor- 
mation. 



4 Rule Execution 

4.1 Overview 

As explained in the previous section, an integration rule cannot be executed by means 
of a single graph transformation. To ensure the correct sequence of rule executions, to 
detect all conflicts between rule applications, and to allow the user to resolve conflicts, 
each integration rule is automatically translated to a set of graph transformations. These 
rule specific transformations are executed together with some generic ones following an 
integration algorithm. In this subsection, we will present the overall algorithm, while in 
the following subsections the phases of the algorithm are explained in detail, showing 
some of the rule specific and generic graph transformations involved. The simplified 
example in Figure 4 (to be explained later) is used to illustrate the algorithm. 

While the algorithm in general supports the concurrent execution of forward, back- 
ward, and consistency analysis rules, we focus on forward transformations only, using 

4 The distinction between dominant and normal increments is not vital, but helpful for pragmatic 
reasons; see next section. 
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the forward transformation rule for a connection as running example. Some aspects of 
the algorithm are omitted, as the treatment of existing links that have become inconsis- 
tent due to modifications in the integrated documents. 



create '\ 
half links J 



c 



find possible rule 'N 
applications J 



S' detect 
overlappings 



construct 



^ generic ^ 
Quie specific ^ 




Fig. 3. Integration algorithm. 



Figure 3 shows a UML activity diagram depicting the integration algorithm. To 
perform each activity, one ore more graph transformations are executed. Activities that 
require the execution of rule specific transformations are marked grey and italic. The 
overall algorithm is divided into three phases. 

During the first phase (construct), all possible rule applications and conflicts be- 
tween them are determined and stored in the graph. First, for each increment in the 
source document that has a type compatible to the dominant increment’s type of any 
rule, a half link is created that references this increment. Then, for each half link the 
possible rule applications are determined. The last step of this phase is a generic trans- 
formation marking overlappings between possible rule applications. 

In the next phase (select), for all rule applications the context is checked. If one rule 
application, whose context is present, is unambiguous, it is automatically selected for 
execution. Otherwise, the user is asked to select one rule among the rules with existing 
context. If there are no executable rules, the algorithm ends. 

In the last phase (execute and cleanup), the selected rule is executed and some 
operations are performed to adapt the information that was collected in the construct 
phase to the new situation. 

4.2 Construction Phase 

In the construction phase, it is determined which rules can be possibly applied to which 
subgraphs in the source document. Conflicts between these rules are marked. This in- 
formation is collected once in this phase and is updated later incrementally during the 
repeated executions of the other phases. 
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a) create half links c) detect overlappings e) execute rule 




Fig. 4. Simplified example integration. 



In the first step of the construction phase (create half links), for each increment, 
the type of which is the type of a dominant increment of at least one rule, a link is 
created that references only this increment (half link). Dominant increments are used as 
anchor for links and to group decisions for user interaction. Half links are the anchors 
for information about possible rule applications and are transformed to consistent links 
after one of the rules has been applied. 

In the example, half links are created for the increments Hand 13, named LI and L2, 
respectively (c.f. Figure 4 a). 

To achieve this, for each rule a PROGRES production is derived that matches an 
increment with the same type as the rule’s dominant increment in its left-hand side, 
with the negative application condition that there is no half link attached to the incre- 
ment, yet. Then on its right-hand side the half link node is created and connected to the 
increment with an edge. All these productions are executed repeatedly, until no more 
left-hand sides are matched, i.e., half links have been created for all possibly dominant 
increments. 

The second step (find possible rule applications) determines the integration rules that 
are possibly applicable for each half link. A rule is possibly applicable for a given half 
link if the source document part of the left-hand side of the synchronous rule without the 
context increments is matched in the source document graph. The dominant increment 
of the rule has to be matched to the one belonging to the half link. For the possible 
applicability, context increments are not taken into account because missing context 
increments could be created later by the execution of other integration rules. For this 
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transformation + TRC2A R3 propose * = 



t oComo s Domi nan t 




condition '2. status = unchecked; 
transfer 4'.ruleld := "C2A-R3"; 
end ; 



Fig. 5. Find possible rule applications. 



reason, the context increments are matched in the selection phase before selecting a rule 
for execution. 

Figure 5 shows the PROGRES transformation for the example rule. The left-hand 
side consists of the half link and the respective dominant increment only because all 
other increments of this rule are context increments. In general, all non-context incre- 
ments and their connecting edges are part of the left-hand side. On the right-hand side, 
a rule node is created to identify the possible rule application (4 ' ). This node carries 
the id of the rule and is connected to the half link. A role node is inserted to explicitly 
store the result of the pattern matching (3 ' ). If there are more increments matched, role 
nodes can be distinguished by an id attribute. The asterisk (*) behind the production 
name tells PROGRES to apply this production for each possible matching of its left- 
hand side. When executed together with the corresponding productions for the other 
rules, as a result all possibly applicable rules are stored at each half link. Please note 
that if a rule is applicable for a half link with different matchings of its source incre- 
ments, multiple rule nodes with the corresponding role nodes are added to the half link. 

In the simplified example (Figure 4 b), three possible rule applications were found, 
e.g., Ra at the link LI would transform the increments II and 12. Please note that the 
role nodes are omitted in the figure. 

Each increment can be referenced by one link only as non-context increment. This 
leads to the fact that there can be conflicts between possible applications of integration 
rules. In the case of a conflict, the user has to choose one of the conflicting rules in 
the selection phase. There are two types of conflicts: First, there can be multiple rule 
nodes at one half link. These share at least the dominant increment, so only one of 
the corresponding rules can be executed. This is the case for link L2 in the example in 
Figure 4 c): Rb and Rc are conflicting. Second, an increment can be referenced by role 
nodes belonging to rule applications of different links. In the example, the increment 12 
is referenced by Ra and Rb. 

In the selection phase, for each link that is involved in a conflict all possible rule 
applications are presented to the user who has to resolve the conflict by selecting one. 
Thus, the conflicts of the first type are directly visible. Conflicts of the second type are 
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transformation + GEN_detectRuleConf licts 
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Fig. 6. Detect overlappings. 



marked with cross references (hyperlinks) between the conflicting rule applications be- 
longing to different links. To prepare the user interaction, conflicts of the second type 
are explicitly marked in the graph. This is done with the help of the generic PROGRES 
production in Figure 6. The pattern on the left-hand side describes an increment ('7) 
that is referenced by two roles belonging to different rule nodes which belong to dif- 
ferent links. The negative node ' 4 prevents multiple markings of the same conflict. 
On the right-hand side, an overlap node is inserted between the two rule nodes (01 in 
the example). Again, this production is marked with an asterisk, so it is executed until 
all conflicts are detected. Besides detecting conflicts between different forward trans- 
formation rules, the depicted production also detects conflicts between forward, back- 
ward, and correspondency analysis rules generated from the same synchronous rule. As 
a result of that, it is not necessary to check whether the non-context increments of the 
right-hand side of the synchronous rule are already present in the target document when 
determining possible rule applications in the second step of this phase. 

In the example in Figure 4 c), the overlap node 01 is created between Ra and Rb 
because they both reference 12. The conflict between Rb and Rc is not explicitly marked 
because it can be seen from the fact that they both belong to the same half link. 



4.3 Selection Phase 

The goal of the selection phase is to select one possible rule application for execution 
in the next phase. If there is a rule that can be executed without conflicts, the selection 
is performed automatically, otherwise the user is asked for his decision. Before a rule 
is selected, the contexts of all rules are checked because only a rule whose context has 
been found can be executed. 
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transformation + TRC2A_R3_contextCheck * = 



consistahtLink 

toComos Increment to Aspenlncrement 

' 2 : ComosOutPort Us ' 6 : subLink ' 9 : AspenOutPort 





condition '5.ruleld = "C2A-R3"; 

'7. status = checked; 



Fig. 7. Check context. 



The context check is performed in the first step of this phase. The context is formed 
by all context elements from the synchronous rule. It may consist of increments of 
source and target documents and of links contained in the integration document. 

In the example in Figure 4 d), the context for Ra consisting of increment 13 in the 
source document was found (Cl ). The context for Rb is empty (C2), the context for Rc 
is still missing. 

Figure 7 shows the PROGRES production checking the context of the example inte- 
gration rule. The left-hand side contains the half link ( ' 7), the non-context increments 
(here, only ' 3), the rule node (' 5), and the role nodes (' 1). The non-context incre- 
ments and their roles are needed to embed the context and to prevent unwanted folding 
between context and non-context increments. For the example rule, the context consists 
of the two ports connected in the source document ('2, '4), the related ports in the 
Aspen document (' 9, ' 10), and the relating sublinks (' 6, ' 8). 

On the right-hand side, a new context node is created ( ' 15). It is connected to all 
nodes belonging to the context by role nodes (11 ' , 12 ' , 13 ' , 14 ' , 16 ' , 17 ' ) and 
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appropriate edges. If the matching of the context is ambiguous, multiple context nodes 
with their roles are created as the production is executed for all matches. 

Because the selection phase is executed repeatedly, it has to be made sure that each 
context match (context node and role nodes) is added to the graph only once. The con- 
text match cannot be included directly as negative nodes on the left-hand side because 
edges between negative nodes are prohibited in PROGRES. Therefore, this is checked 
using an additional graph test which is called in the restriction on the rule node. The 
graph test is not presented here because it is rather similar to the right-hand side of this 
production. 

The context is checked for all possible rule applications. To make sure, that the 
context belonging to the right rule is checked, the rule id is checked in the condition 
part of the productions. After the context of a possible rule application has been found, 
the rule can be applied. 



transformation + GEN_selectRuleAndContextAutomaticallylL 
( out selRule : rule) = 



possibleRule 




selectedRul^. 



selectedContext 






return selRule := '1; 



Fig. 8. Select unambiguous rule. 



After the context has been checked for all possible rule applications, some rules can 
be applied, others still have to wait for their context. The next step of the algorithm (find 
unambiguous rule) tries to find a rule application that is not involved in any conflict. The 
conflicts have already been determined in the construction phase. Because any incre- 
ment may be referenced by an arbitrary number of links as context, no new conflicts are 
induced by the context part of the integration rules. The generic PROGRES production 
in Figure 8 finds rule applications that are not part of a conflict. On the left-hand side 
a rule node is searched ('1) that has only one context node and is not related to any 
overlapping node. It has to be related to exactly one half link ( ' 2) that does not have an- 
other rule node. For forward transformation rules, a rule node belongs to one link only, 
while nodes of consistency analysis rules are referenced by two half links. Therefore 
for consistency analysis rules, another production is used which is not shown here. A 
rule node is not selected for execution if there are conflicting rules, even if their context 
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is still missing. As the context may be created later, the user has to decide whether to 
execute this rule and thereby making the execution of the other rules impossible. 

If a match is found in the host graph, the rule node and the context node are se- 
lected for execution by substituting their referencing edges by selectedRule and 
selectedContext edges, respectively. The rule node is returned in the output pa- 
rameter selRule. The corresponding rule can be applied in the execution phase. 

In the example in Figure 4 d), no rule can be automatically selected for execution. 
The context of Rc is not yet available and Ra and Rb as well as Rb and Rc are conflict- 
ing. 

If no rule could be selected automatically, the user has to decide which rule has to 
be executed. Therefore, in the next step (find decisions), all conflicts are collected and 
presented to the user. For each half link, all possible rule applications are presented. If 
a rule application conflicts with another rule of a different half link, this is presented 
as annotation at both half links. Rules that are not executable due to a missing context 
are included in this presentation but cannot be selected for execution. This information 
allows the user to select a rule manually, knowing which other rule applications will be 
made impossible by his decision. If there are no decisions left, the algorithm terminates. 
If there are still half links left at the end of the algorithm, the user has to perform the 
rest of the integration manually. If there are decisions, the result of the user interaction 
is stored in the graph (ask for user decision) and the selected rule is executed in the 
execution phase. In the example, the user selects rule Ra. 

4.4 Execution Phase 

The rule that was selected in the selection phase is executed in the execution phase. 
Afterwards, the information collected during the construction phase has to be updated. 

In the example (Figure 4 e), the corresponding rule of the rule node Ra is executed. 
As a result, the increments 14 and 15 are created and references to all increments are 
added to the link LI . 

Rule execution is performed by a rule specific PROGRES production, see Figure 9. 
The left-hand side of the production is nearly identical to the right-hand side of the 
context check production in Figure 7. The main difference is that the edge from the link 
(' 10) to the rule node (' 7) is now a selectedRule edge and the edge from the rule 
node to the context node (' 13) is a selectedContext edge. The possibleRule 
and possibleContext edges are replaced when a rule together with a context is 
selected for execution either by the user or automatically. 

On the right-hand side, the new increments in the target document are created and 
embedded by edges. In this case, the connection (18 ' ) is inserted and connected to the 
two Aspen ports (14 ' , 15 ' ). The half link (10 ' ) is extended to a full link, referencing 
all context and non-context increments in source and target document. The information 
about the applied rule and roles etc. is kept to be able to detect inconsistencies occurring 
later due to modifications in source and target documents. 

The following steps of the algorithm are performed by generic productions that up- 
date the information about possible rule applications and conflicts. First, obsolete half 
links are deleted. A half link is obsolete if its dominant increment is referenced by an- 
other link as non-context increment. In the example this is not the case for any half link 
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transformation + TRC2A_R3_apply ( selRule : rule) = 




transfer 10' .status := ruleBasedConsistent; 



Fig. 9. Execute rule. 



(Figure 4 e). Then, possible rule applications that are no longer possible are removed. 
In Figure 4 f), Rb is deleted because it depends on the availability of 12 which is now 
referenced by LI as non-context increment. If there were alternative rule applications 
belonging to LI they would be removed, as well. Last, obsolete overlappings have to 
be deleted. In the example, 01 is removed because Rb was deleted. Please note that the 
cleanup procedure may change depending on how detailed the integration process has 
to be documented. 

5 Related Work 

Our approach to incremental integration for development processes is based on the 
triple graph grammar approach introduced by Schiirr [3] and early work at our de- 
partment in the area of software engineering [11] during the IPSEN project [12]. We 
adapted the results to the domain of chemical engineering [9] and extended the original 
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approach: now, we are dealing with the problem of a-posteriori integration, the rule 
definition formalism was modified [13] and the rule execution algorithm was further 
elaborated to support conflict detection (see Section 4). 

Related areas of interest in computer science are (in-) consistency checking [14] 
and model transformation. Consistency checkers apply rules to detect inconsistencies 
between models which then can be resolved manually or by inconsistency repair rules. 
Model transformation deals with consistent translations between heterogeneous mod- 
els. In the following a few projects of both areas are presented which are using graph 
transformations. Our approach contains aspects of both areas but is more closely related 
to model transformation. 

In [15], a consistency management approach for different view points [16] of de- 
velopment processes is presented. The formalism of distributed graph transformations 
[17] is used to model view points and their interrelations, especially consistency checks 
and repair actions. To the best of our knowledge, this approach works incrementally but 
does not support detection of conflicting rules and user interaction. 

The consistency management approach of Fujaba [18] supports inter-model consis- 
tency checks. The approach is based on triple graph grammars [3] as well. Comparable 
to our approach, different graph transformations are derived from each triple rule. User 
interaction is restricted to choosing the repair action for a detected inconsistency. Con- 
flict detection between different inconsistency checking rules is supported only w.r.t. 
preventing endless loops if repair actions create new inconsistencies. 

Model transformation recently gained increasing importance because of the model 
driven approaches for software development like the model driven architecture (MDA) 
[19]. In [20] and [21] some approaches are compared and requirements are proposed. 

The PLCTools prototype [2] allows the translation between different specification 
formalisms for programmable controllers. The translation is inspired by the triple graph 
grammar approach [3] but is restricted to l:n mappings. The rule base is conflict free so 
there is no need for conflict detection and user interaction. It can be extended by user 
defined rules which are restricted to be unambiguous l:n mappings. Incrementality is 
not supported. 

In the AToM project [1], modelling tools are generated from descriptions of their 
meta models. Transformations between different formalisms can be defined using graph 
grammars. The transformations do not work incrementally but support user interaction. 
Unlike in our approach, the control of the transformation is contained in the user-defined 
graph grammars. 

The QVT Partner’s proposal [22] to the QVT RFP of the OMG [23] is a relational 
approach based on the UML and very similar to the work of Kent [24]. While Kent 
is using OCL constraints to define detailed rules, the QVT Partners propose a graphi- 
cal definition of patterns and operational transformation rules. Incrementality and user 
interaction are not supported. 

BOTL [25] is a transformation language based on UML object diagrams. Compara- 
ble to graph transformations, BOTL rules consist of an object diagram on the left-hand 
side and another one on the right-hand side, both describing patterns. Unlike graph 
transformations, the former one is matched in the source document and the latter one is 
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created in the target document. The transformation process is neither incremental nor 
interactive. There are no conflicts because of very restrictive constraints on the rules. 

Transformations between documents are urgently needed (not only) in chemical 
engineering. They have to be incremental, interactive and bidirectional. Additionally, 
transformation rules are most likely ambiguous. There are a lot of transformation ap- 
proaches and consistency checkers with repair actions that can be used for transforma- 
tion as well, but none of them fulfills all of these requirements. Especially, the detection 
of conflicts between ambiguous rules is not supported. We address these requirements 
with the integration algorithm described in this contribution. 



6 Conclusion 

We have presented a novel approach to the execution of integration tools in incremental 
and interactive integration tools using graph transformations. Our approach was eval- 
uated in an industrial cooperation with the German software company Innotec with a 
simplified prototype for the integration of flow sheets and simulation models imple- 
mented in C++. Experiments with the prototype showed that our approach considerably 
leverages the task of keeping dependent documents consistent to each other. Neverthe- 
less, there is still the need for a lot of user interaction. Besides choosing among different 
possible rules, contradictory changes that have been made to the documents in parallel 
have to be resolved manually. The main ideas realized in the prototype have been incor- 
porated into Innotec’s product Comos PT and are well accepted by their customers. Of 
course, the rule execution formalism had to be simplified and the remaining complexity 
is hidden from the end user. 



Acknowledgements 

This work was in part funded by the CRC 476 IMPROVE of the Deutsche Forschungs- 
gemeinschaft (DFG). Furthermore, the authors gratefully acknowledge the fruitful co- 
operation with Innotec. 



References 

1 . de Lara. J., Vangheluwe, H.: Computer aided multi-paradigm modelling to process petri-nets 
and statecharts. In: Proc. of 1st Int. Conf. on Graph Transformations (ICGT 2002). LNCS 
2505, Springer (2002) 239-253 

2. Baresi, L., Mauri. M., Pezze, M.: PLCTools: Graph transformation meets PLC design. Elec- 
tronic Notes in Theoretical Computer Science 72 (2002) 

3. Schiirr, A.: Specification of graph translators with triple graph grammars. In: Proc. of the 
20th Inti. Workshop on Graph-Theoretic Concepts in Computer Science (WG 1994). LNCS 
903. Herrsching, Germany, Springer (1995) 151-163 

4. Becker, S.M., Westfechtel, B.: Incremental integration tools for chemical engineering: An 
industrial application of triple graph grammars. In: Proc. of the 29th Workshop on Graph- 
Theoretic Concepts in Computer Science (WG 2003). LNCS 2880, Springer (2003) 46-57 




38 



Simon M. Becker, Sebastian Lohmann, and Bernhard Westfechtel 



5. Lohmann, S.: Ausfiihrung von Integrationsregeln mit einem Graphersetzungssystem. Mas- 
ter’s thesis, RWTH Aachen University, Germany (2004) 

6. Nagl, M., Marquardt, W.: SFB-476 IMPROVE: Informatische Unterstiitzung iibergreifender 
Entwicklungsprozesse in der Verfahrenstechnik. In: Informatik ‘97: Informatik als Innova- 
tionsmotor. Informatik aktuell, Aachen, Germany, Springer (1997) 143-154 

7. innotec GmbH: COMOS PT Documentation, http://www.innotec.de. (2003) 

8. Aspen-Technology: Aspen Plus Documentation, http://www.aspentech.com. (2003) 

9. Becker, S., Haase, T., Westfechtel, B., Wilhelms, J.: Integration tools supporting cooperative 
development processes in chemical engineering. In: Proc. of the 6th Biennial World Conf. on 
Integrated Design and Process Technology (IDPT-2002), Pasadena, California, USA, Society 
for Design and Process Science (2002) 10 pp. 

10. Schiirr, A., Winter, A., Ziindorf, A.: The PROGRES approach: Language and environment. 
Volume 2. World Scientific (1999) 487-550 

11. Lefering, M., Schiirr. A.: Specification of integration tools. [12] 324-334 

12. Nagl, M., ed.: Building Tightly-Integrated Software Development Environments: The IPSEN 
Approach. LNCS 1170. Springer, Berlin, Germany (1996) 

13. Becker, S.M., Haase, T., Westfechtel, B.: Model-based a-posteriori integration of engineering 
tools for incremental development processes. Journal of Software and Systems Modeling 
(2004) to appear. 

14. Spanoudakis, G., Zisman, A.: Inconsistency management in software engineering: Survey 
and open research issues. In: Handbook of Software Engineering and Knowledge Engineer- 
ing. Volume 1. World Scientific (2001) 329-380 

15. Enders, B.E., Heverhagen, T.. Goedicke, M., Tropfner, P, Tracht, R.: Towards an integration 
of different specification methods by using the viewpoint framework. Transactions of the 
SDPS 6 (2002) 1-23 

16. Finkelstein, A., Kramer, J., Goedicke, M.: ViewPoint oriented software development. In: 
Inti. Workshop on Software Engineering and Its Applications. (1990) 374-384 

17. Taentzer, G., Koch, M.. Fischer, I., Voile, V.: Distributed graph transformation with applica- 
tion to visual design of distributed systems. In: Handbook on Graph Grammars and Comput- 
ing by Graph Transformation: Concurrency, Parallelism, and Distribution. Volume 3. World 
Scientific (1999) 269-340 

18. Wagner, R., Giese, H., Nickel, U.A.: A plug-in for flexible and incremental consistency 
mangement. In: Proc. of the Inti. Conf. on the Unified Modeling Language (UML 2003), 
San Francisco, California, USA, Springer (2003) 

19. OMG Architecture Board ORMSC: Model driven architecture (MDA) (2001) 

20. Gerber, A., Lawley, M., Raymond, K.. Steel, J., Wood, A.: Transformation: The missing link 
of MDA. In: Proc. of 1st Inti. Conf. on Graph Transformations (ICGT 2002). LNCS 2505, 
Barcelona, Spain, Springer (2002) 90-105 

21. Kent, S., Smith, R.: The Bidirectional Mapping Problem. Electronic Notes in Theoretical 
Computer Science 82 (2003) 

22. Appukuttan. B.K., Clark, T., Reddy, S., Tratt, L., Venkatesh, R.: A model driven approach to 
model transformations. In: Proc. of the 2003 Model Driven Architecture: Foundations and 
Applications (MDAFA2003). CTIT Technical Report TR-CTIT-03-27, Univ. of Twente, The 
Netherlands (2003) 

23. OMG: MOF 2.0 query / view / transformations, request for proposal (2002) 

24. Akehurst, D., Kent, S., Patrascoiu, O.: A relational approach to defining and implementing 
transformations between metamodels. Journal on Software and Systems Modeling 2 (2003) 

25. Braun, R, Marschall, F.: Transforming object oriented models with BOTL. Electronic Notes 
in Theoretical Computer Science 72 (2003) 




Composition of Relations 
in Enterprise Architecture Models 



Rene van Buuren, Henk Jonkers, Maria-Eugenia Iacob, and Patrick Strating 

Telematica Instituut, P.O. Box 589, 7500 AN Enschede, The Netherlands 
{Rene . vanBuuren, Jonkers , Iacob, StratingjOt el in.nl 



Abstract. Enterprise architecture focuses on modelling different do- 
mains relevant for businesses or organisations. A major issue is how to 
express and maintain the relations between different modelling domains. 
Current architectural support focuses mainly on modelling techiques and 
language for single domains. For enterprise architectures it is important 
to have the flexibility to create cross domain models and views in which 
inter-relations are made explicit. Therefore, a language for enterprise 
architecture models should pay particular attention to the relations be- 
tween domain models. In this paper we present a general approach to 
derive an operator that allows for the composition of relations in archi- 
tecture description languages. This general approach opens the door for 
a number of interesting application areas, two of which are worked out 
in more detail: the creation of more modelling flexibility, by allowing 
to leave out certain details, and automated abstraction and complexity 
reduction of models facilitating stakeholder-specific visualisations. For a 
specific enterprise architecture modelling language, we explicitly derive 
this composition operator. Because of the specific properties of this op- 
erator, the transitive closure of the metamodel of this language can be 
determined. 



1 Introduction 

Many enterprises have only limited insight in the coherence between their busi- 
ness processes and ICT. An enterprise can be viewed as a complex ‘system’ 
with multiple domains that may influence each other. In general, architectures 
are used to describe components, relations and underlying design principles of a 
system [8]. Constructing architectures for an enterprise may help to increase in- 
sight and overview required to successfully align the business and ICT. Although 
the value of architecture has been recognised by many organisations, mostly sep- 
arate architectures are constructed for various organisational domains, such as 
business processes, applications, information and technical infrastructure. The 
relations between these architectures often remain unspecified or implicit. 

In contrast to architectural approaches for models within a domain (e.g., the 
Unified Modelling Language, UML [2] for modelling applications or the tech- 
nical infrastructure or the Business Process Modelling Notation BPMN [4] for 
modelling business processes), enterprise architecture focuses on establishing a 
coherent view of an enterprise. The term refers to a description of all the relevant 



H. Ehrig et al. (Eds.): ICGT 2004, LNCS 3256, pp. 39-53, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




40 



Rene van Buuren et al. 



elements that make up an enterprise and how those elements inter-relate. Models 
play an important role in all approaches to enterprise architecture. Models are 
well suited to express the inter-relations among the different elements of an en- 
terprise and, especially if they can be visualised in different ways, they can help 
to alleviate the language barriers between the domains. Although a full-fledged 
enterprise architecture approach encompasses much more than just modelling, 
models play a central role in the support for enterprise architects that we develop 
within the ArchiMate project. 

A similar movement towards integrated models can be recognised in the 
Model Driven Architecture (MDA) approach to software development [6] . MDA 
is a collection of standards of the Object Management Group (OMG) that raise 
the level of abstraction at which software solutions are specified. Typically, MDA 
results in software development tools that support specification of software in 
UML instead of in a programming language like Java. Recently, OMG has ex- 
tended its focus to more business-oriented concepts and languages, to be devel- 
oped within the MDA framework. These developments make MDA just as rele- 
vant for enterprise architecture as it is now for software development. The MDA 
trend reflects the growing awareness that it is important to take into account 
business considerations in software development decisions. Therefore, enterprise 
architectures form a natural starting point for automated software engineering. 

In existing architectural description languages, the relations that are allowed 
between concepts are fixed: they are specified either informally or by means 
of a formal metamodel. Often, architects require the flexibility to create cross 
domain views or models. In practice such models are created but they often lack 
a formal and well-defined meaning. However, a need for a well-defined meaning 
becomes apparant in the context of maintainability and analyses performed on 
architectures, such e.g. impact of change analysis. Because relations play such 
a central role in enterprise architecture, we take a closer look at the properties 
of these inter-domain relations. In particular, this paper focuses on the question 
how indirect relations between concepts can be formally derived by defining how 
existing relations in models can be composed into new explicit relations. 

This paper is organised as follows. Section 2 outlines our problem description 
using the ArchiMate language. Section 3 forms the core of this paper, in which we 
present a general approach for deriving a composition operator for architectural 
relations in any metamodel, and apply this approach to derive this operator for 
the metamodel of the ArchiMate language. In Section 4, two examples illustrate 
how the derived composition operator can be applied in modelling practice. In 
Section 5 we discuss related work. Finally, we draw our conclusions in Section 6. 

2 Problem Description 

As the basis for our approach, we use the core of the architecture description 
language of the ArchiMate project [9]. Figure 1 shows the metamodel with the 
core concepts of our language and the relations that they may have, expressed 
as a UML class diagram. The language can be used to model the business layer 
of an enterprise (e.g., the organisational structure and business processes), the 
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Fig. 1. Metamodel of the core of the enterprise architecture description language. 



application layer (e.g., application components) and the relation between these 
layers. The UML classes in the metamodel represent the concepts, while the 
UML associations represent the possible relations that they may have. The roles 
in these associations indicate the relation type. Note that a relation between 
concepts in the metamodel defines a relation that is permitted between instances 
of these concepts. A model contains model elements, which are instances of the 
concepts in the metamodel, and actual relations between them. 

Figure 2 shows a fragment from the ArchiMate metamodel, as well as a 
small model with instances of the concepts and relationships corresponding to 
this fragment. The relations represented by solid lines are not the only relations 
that exist: because of the coherent nature of the metamodel, all concepts are 
(explicitly or implicitly) linked to each other. 




Fig. 2. Composition of relations in metamodel and model. 



For example, there is a path between the application component (Order ad- 
ministration system) and the organisational service (Provide information), with 
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three intermediary concepts. This means that these concepts are related to each 
other in some way, although it is not immediately clear what the nature of this 
relation is. 

The direct relations as specified in the metamodel are well defined, but a 
rule for the composition of relations is not readily available. We will show in this 
paper how a composition operator for relations can be constructed based on the 
original metamodel. 

The main question that we want to address is: given a (meta)model with con- 
cepts (model elements) and relations between them , can we say something about 
the implicit relations between concepts (model elements) that are not captured in 
the (meta)model? 

In principle including these relations amounts to an extension of the original 
metamodel. Having extended the metamodel allows for the construction of a 
larger family of models that conform to the (extended) metamodel, i.e. it creates 
more modelling flexibility. This flexibility becomes apparent from the ability to 
define new models that conform to the extended metamomodel. Another issue 
relates to the composition of relations in existing models. Since there are an 
infinite number of potential models (conforming to a single metamodel), one has 
to show that models still conform to the (extended) metamodel after applying 
a composition of relations. 

In the context of this paper, all of the relations have a direction, indicated 
by the arrows. The direction is needed for the definition of our composition 
operator. Certain relations (e.g., ‘assignment’ and ‘association’) can be used in 
either direction, which we denote by two separate directed relations. 

Note that for clarity, we have omitted recent extensions to the language 
that cover, e.g., concepts for modelling the technical infrastructure. These ex- 
tensions are not necessary to explain and illustrate the results described in this 
paper; however, the results do apply to these extensions as well. These could be 
described as structural relations, which model the coherence between different 
‘layers’ and ‘aspects’ in enterprise architecture. Other relations, such as triggers 
and flows, as well as relations such as specialisation, are not considered. 

3 Derivation of a Composition Operator 

In this section we present a general approach to derive a composition operator 
for an architectural description language, and this approach is applied to the 
ArchiMate language. In this context, we assume that an architectural description 
language consists of a number of concepts that represent the ‘components’ in 
an architectural description and possible relations between these concepts. The 
abstract syntax of the language, i.e., the concepts and possible relations between 
them, is defined by the metamodel of the language. 

3.1 Formalisation of Metamodel and Composition Operator 

A metamodel (M) can be formally defined as a 3-tuple: M = ( C,T,R ), where: 
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— C is a set of concepts. 

— T is a set of relation types. 

— R C C x C x T is a set of relations between concepts. More precisely, an 
element (ci, c 2 , t) £ R is called a ‘relation’ and it expresses the fact that a 
relation of type t exists from concepts Ci to C 2 - 

Our relations types and relations correspond to relationships and links respec- 
tively in UML. Let us assume that there exists a composition operator for re- 
lation types, denoted by ®. Note that T only contains the relation types that 
currently occur in the metamodel. Since we cannot be certain beforehand that T 
is closed with respect to the composition operator, we define an extension of T, 
denoted by T* , consisting of T and all relation types that may be the result of 
the composition of relation types in T*. By definition, T C T*. Now, we define 
the composition operator for relation types on the extended set of relation types: 
® : T* x T* i-> T*. 

With this composition operator for relation types, we can now define the set 
of ‘two step’ relations, R 2 : 

^01,02,03 ecVt 1 ,t 2 gT : (ci,c 2 ,fi) € R A (c 2 , C 3 , t 2 ) G R => (ci, C 3 , t± 0 t 2 ) € R 2 

Provided that the composition operator is associative, i.e., Vt 1) t 2l t 3 gT* : (ii 0 
t 2 ) 0 f 3 = t\ 0 (f 2 ® t 3 ) then for any n > 2, the set of ‘n step’ relations, i? ra , can 
be defined recursively: 

Vci ,c n _ 3 ,c n eoVti ,i 2 6T* - (Cl, Cn— 1 , tl ) £ R A (Cn- 1 , Cn , ^ R “4* (Cl , Cn ,tl®t2) £ H 

Because pair-wise composition in a chain of relations may be performed in an 
arbitrary order, associativity of ® ensures that the resulting i?" is unambiguously 
defined, Now, the transitive closure, R*, of the set of relations can be defined as: 

OO 

R* = |J R n 

n—1 

As an extension of this, we define the transitive closure of the metamodel as 
M* = (' C,T*,R *). 

Given the composition operator ® for relation types, we can now also define 
an operator 0# : R* x R* i— > R* for relations. Consider ri,r 2 € R * , with 
ri = (ci,c 2 ,fi) and r 2 = (c 2 ,C 3 ,f 2 ). First we define the projection operators 
7Ti,7r 2 , and 7T3, where ^ ii takes the i-th element of each tuple in R* . In case 
7Ti (r 2 ) = 7r 2 (ri) the operator ®^ : R* x R* i— > R* is given by: 

ri ®Rr 2 = (7ri(ri),7r 2 (r 2 ),7r 3 (ri) 0 7r 3 (r 2 )) 

If 7Ti(r 2 ) ^ 7r 2 (n), ®i?(ri,r 2 ) is not defined. It is easy to prove that the associa- 
tivity of ®i{ results from the definition of 0# and the associativity of 0. 

The transitive closure of the metamodel provides a specification of relations 
that may be drawn between concept instances in actual models. In this way, we 
can keep the metamodel relatively simple by only defining the direct relations 
between concepts, while the other relations that are allowed can be derived by 
means of the composition operator. 
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3.2 Extension to the Model Level 

Thus far, we have derived a composition operator that applies to relations in a 
metamodel of a language. However, we would also like to be able to compose 
relations in models expressed in the language defined by a metamodel. 

A model ( A ) can be formally defined as a 4-tuple: A = ( E , T*,Q , F c ), where: 

— E is a set of model elements 

— T* is a set of relation types (the same relation types that are used in the 
metamodel) 

— QCExExT* is a set of relations between model elements 

— F c : E i— > C is a function that maps model elements to metamodel concepts 

Definition 1. A model A = (E,T*,Q,F C ) conforms to a metamodel M = 
( C,T*,R ) if and only if Vt € T*Ve i,e2 € E : (ei,e2 ,t) £ Q => 

(F c (ei), F c (e 2 ), t) € R. We adopt the notation A£5 M for this. 

Analogous to the metamodel (see Section 3.1) we can now define the transi- 
tive closure A* of a model A. The set of ‘two step’ model relations, Q 2 , is defined 
as: 

,62,63 &E s dt 1 ,t 2 eT» '■ (ei, e 2 , ti) £ Q A (e 2 , e%, tf) £ Q => (ei, e$, t\ <S> tf) £ Q 2 

Similarly, the sets Q n of ‘n step’ model relations are determined, resulting in 
the transitive cosure of the set of model relations: 

OO 

Q* = U Q n 

n—1 

As an extension of this, we define the transitive closure of the model as A* = 

(E,T*,Q*,F c ). 

Next, we define a composition operator : Q* x Q* i— > Q* for model 
relations , defined in terms of the composition operator for relation types (again 
using the projection operators 7Ti, 7r2 and ^3): 

qi ®Q <12 = ( 7 ri(gi), 7 r 2 (g 2 ), 7 r 3 (gl) ® n 3(52)) 

in case = ^2(51); otherwise, qi <72 is not defined. 

Now we prove the following property, which says that applying the composi- 
tion operator to any model that conforms to M* always results in a new model 
that also conforms to M*. This means that not only a ‘normal’ model, that con- 
forms to M, can be extended, but also extensions to extended models conform 
to M*. 

Theorem 1 . Let M = (C, T, R) be a metamodel for which an associative oper- 
ator (g> for the composition of relation types exist and let A = ( E,T*,Q,F C ) be 
a model. Let M* and A* be the transitive closures of the metamodel and model, 
respectively, with respect to the composition operator. Then: A£Z M* => A*'&M* . 
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Proof. Consider any relation (e, e , t) G Q* from A* . According to the definition 
of Q* , this means that 3n > 1 : (e,e',t) G Q n . This further implies that there 
exist e \, . . . , e„_i G E and t \, . . . , t n G T* such that: 

(e,e',t) = (e,ei,ti) ® Q (ei,e 2 ,f 2 ) • . . ®q (e„_ 

with (e, ei, ti), (ei, e 2 , f 2 ), . . (e n _i, e', t n ) G Q and t = t\ ® f 2 <8 . . . (8» t n . Since 
A3"M* it follows that 

(F c (e), F c (ei), ti), (F c (ei), F c (e 2 ), t 2 ), ■■■, (F c (e n -i), F c (e'), t n ) G R* 

The result of applying <S>je to this chain of relations results in a new relations 
which belongs to R* . According to the definition of ®r, this relation is: 

(F c (e),F c (e i),fi) (F c (ei),F c (e 2 ),t 2 ) (F c (e n -i, F c (e'),t n ) = 

(-F’c(e), ti 

This means that Ve, e' G EVt G T* : (e,e',t) G <5* => (F c (e), F c (e'), t) G R*, i.e., 
A*3’M*. 

In the remainder of this section, we show that such an associative composition 
operator ® exists for the ArchiMate metamodel as described in Section 2, which 
means that the transitive closure of the metamodel can be defined. 

3.3 Approach for Derivation of Composition Operator 
for ArchiMate 

Figure 3 schematically shows the approach that we follow to derive a composition 
operator for relation types, which can be used to define the transitive closure of 
the metamodel. The approach consists of a constructive part and a reasoning 
part. As a starting point, we have the metamodel M as described in Section 2 
(and formalised in Section 3.1). To determine the concepts and relations in the 
metamodel, we have used subjective knowledge about modelling phenomena 
from the ‘real world’. We refer to this subjective knowledge as the ‘architectural 
semantics’ of the concepts. 

The first constructive step in our approach is to determine all possible ‘two 
step’ relations between concepts, i.e., to determine R 2 (and M 2 ). For instance, 
there is a ‘two step’ relation between application component and business role, 
which is the composition of a ‘composition relation’ and a ‘used by’ relation. 
Having determined all the ‘two step’ relations, we derive the properties of the 
composition operator. However, it is important to realise that there is in gen- 
eral no formal, objective way to obtain the relation types for the ‘two step’ 
relations. In the same way as for the relations between concepts in the original 
metamodel, the most suitable relation type is determined based on the archi- 
tectural semantics, i.e., on our knowledge of the ‘real world’. In Section 3.4 we 
describe this constructive step in more detail. If we can show that the operator 
derived from the ‘two-step’ relations is associative, it can be used to derive the 
multiple-step relations and, ultimately, the transitive closure of the metamodel, 
M*. Section 3.5 describes this formal step in more detail. 
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3.4 Derivation of ‘Two Step’ Relations 
for the ArchiMate Metamodel 

The core of the ArchiMate metamodel, as shown in Figure 1, has a set C con- 
sisting of 14 concepts and the following set T of (directed) relation types: 

T = { association , realisation, used by, composition, assignment, aggregation, access } 

The set of relations is determined by all the arrows in the metamodel: for ex- 
ample, elements of R include ( application service, business behaviour, used by) 
and ( business actor, business role, assignment). All possible ‘two step’ relations 
can be determined by means of the square of an incidence matrix of the graph 
that represents the metamodel. Figure 4 shows an example. There is a ‘two step’ 
relation between the Application function and Business process, which is the 
composition of the realisation and the used by relations. In this case, based on 
our modelling experience (semantics) , it makes sense to say that the Application 
function (which realises the Application service) is also used by the Business pro- 
cess; therefore, we state that the tuples ( application function, application service, 
realisation) and ( application service, business process, used by) result in ( appli- 
cation function, business process, used by). For this specific example realisation 
<8) used by = used by. Furthermore, by looking at all other combinations of two 
relations having the types realisation and used by, respectively, we have observed 
that in all these cases a relation of the type used by makes sense as the result of 
the operator. This allows us to conclude that for the ArchiMate metamodel, 
in general, realisation <8> used by = used by. 

Note that we actually determine the relation between arbitrary instances of 
these concepts: only at the instance level, the relations have a meaning in reality. 
This supports the conjecture that it is also allowed to apply the composition op- 
erator at the model level, not only at the metamodel level. Section 3.2 elaborates 
on this. 




Fig. 3. Approach. Fig. 4. Example of a ‘two step’ relation. 

We determined all the individual ‘two steps’ relations in the metamodel. 
Not all the combinations of relation types occur in the metamodel M; in fact, 
we have determined for R x R C R* x R*. Successive construction of the 
composed relations in R 2 , R 3 etc. would ultimately yield R* and the operator 
We first concentrate on the composition operator for relation types, <S>, because 
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the operators and ® are strongly linked. In order to obtain a complete 
description of ® we also consider the missing combinations of relations types. 
With the seven relation types that we have identified, this results in a table with 
7 x 7 = 49 entries. A fragment of this table is shown below. The entries in italics 
are combinations that do not occur in the metamodel M . 

The example of a ‘two step’ relation in figure 4 clarifies how to read the table. 
Since the relations in the ArclriMate metamodel have a direction, the relations 
described in the table correspond to this direction. From Figure 4, t\ and t 2 
correspond to the relations realisation and used by, respectively, the resulting 
composition relation is represented by the third column £ 3 , which in this example 
is a used by relation. 

3.5 Definition of the Composition Operator 
for the ArchiMate Metamodel 

The first thing that can be noticed from Table 1) is that for any pair of relation 
types t\ and t 2 , the composition t\ ® t 2 always equals either t\ or t 2 : this implies 
that the composition operator is closed within T, and thus T* = T. 

Further observation discloses that a total order of the relation types can be 
recognised. A way to represent this total order is to define a ‘weighing’ function 
W : T* 1 — > IN (where IN is the set of natural numbers), as shown in Table 2. 



Table 1. Composed ‘two step’ relations 
types; the dots indicate that only a part of 
the table is presented. 



tl 


£2 


t3 


Association 


Access 


Association 




Aggregation 


Association 




Composition 


Association 




Assignment 


Association 




Association 


Association 




Realisation 


Association 




Used by 


Association 


Realisation 


Access 


Access 




Aggregation 


Realisation 




Composition 


Realisation 



Table 2. Relations and their weights. 



t 


W(t) 


Association 


1 


Access 


2 


Used by 


3 


Realisation 


4 


Assignment 


5 


Aggregation 


6 


Composition 


7 



By means of these weights the composition operator can be defined as follows: 
W(tt ®t 2 ) = min(VF(fi), W{T 2 )) 

where ‘min’ is the traditional minimum operator for (integer) numbers. Infor- 
mally speaking, we can say that the weight function determines the ‘strength’ 
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of a relation; the composition of relations then always results in the ‘weakest’ 
relation. If we define 

W 2 : T* x T* i-> IN x IN; W 2 (t 1 ,t 2 ) = {W{t 
then the composition operator can be defined as 

® = W^ 1 o min 0 W 2 

where o denotes the traditional mathematical composition operator. 

This operator is a formal representation of the composition table Table 1. 
For example, assume that t\ is association. Because association has the lowest 
weight of all relation types, t\ <g)t 2 is always association, regardless of what t 2 is. 
Because the minimum operator is associative, our composition operator is also 
associative (we will omit the formal proof of this, which is rather straightforward 
when using the explicit formulation of the composition operator). 

These properties of ® fulfil the condition that is needed to be able to con- 
struct the transitive closure of our metamodel. We do not explicitly need to 
determine ‘n step’ relations, because these can be derived by applying the ‘two 
step’ composition repeatedly (in an arbitrary order). 

3.6 Consequences for Metamodel Modifications and Extensions 

Thus far, we have been successful in deriving an explicit composition operator 
(and thus the transitive closure, M*) for the ArchiMate metamodel. This raises 
the question what happens to the composition operator if the metamodel is 
modified? 

In this paper, we intentionally separated the formal approach to arrive at 
® (and thus M*) from the actual determination of this operator for case of 
the ArchiMate metamodel. Both approaches remain valid for a modified version 
of the metamodel. However, changes might influence the outcome of the com- 
position operator ®. For the ArchiMate metamodel, the composition operator 
corresponding to M 2 turned out to have the desired associativity property, which 
means that the transitive closure M* can be determined immediately. Also, it 
was possible to express the operator in a closed form. For the composition op- 
erator in an arbitrary metamodel, this is not necessarily the case. 

During the initial design of the ArchiMate language no explicit attention 
was paid to any desired formal properties of the metamodel itself. Emphasis was 
put on the applicability of the language. Therefore, the composition operator 
® is not enforced by design, but follows by uncovering the inherent properties 
of the metamodel. The composition operator has been derived by construction. 
This means that for any change or extension of the metamodel, parts of this 
construction have to be repeated. One might argue that conservation of the 
composition operator may become a design principle for metamodel extensions. 

4 Example Applications 

In this section we illustrate how composition of relations, as derived in this paper, 
can be used in architectural modelling practice. In Section 4.1 we show how the 
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composition operator is used to extend the relations allowed by the metamodel, 
thus allowing for more flexible modelling. Section 4.2 illustrates the use of the 
composition operator as the basis for automatically generating model views and 
visualisations. 

4.1 Modelling 

Figure 5 shows a typical example of a high-level architectural model that we 
would like to express in our language. It shows that an end-user (modelled as a 
business actor) makes use of an application; this application makes use of two 
(application) services realised by supporting applications. One of the supporting 
applications, the ‘customer database’, accesses’ a data object which represents 
the database content. 

Formally, this model does not conform to the original ArchiMate metamodel 
as presented in Figure 1: e.g., in the metamodel, there is no ‘used by’ rela- 
tion between the Application component (here end-user application) and the 
Business actor (here end-user) concepts, nor an ‘access’ relation between an Ap- 
plication component and a Data object. However, in the transitive closure of the 
ArchiMate metamodel, these relations do exist. In Figure 5, we specify how the 
relations are composed from the relations in the original metamodel. (Between 
parentheses, the intermediary concepts that have been omitted are shown.) 

When we would strictly adhere to the original metamodel, we would always 
be forced to include all the intermediary concepts in our models. Because of 
the definition of the transitive closure, we can now also create a wide range of 
more abstract models like the one shown in Figure 5, without losing a precise 
definition of the meaning of the models. 

4.2 Automated Abstraction and Visualisation 

Integrated architectural descriptions may become very extensive and complex. 
For the presentation of architectural descriptions to specific stakeholders (many 
of which are not modelling experts), it may be useful to select only those aspects 
of a complex model that are relevant for their concerns, and visualise these 
aspects in a way that appeals to them. 

For this purpose, the relevant information has to be extracted from the model. 
The notion of viewpoints, as defined in IEEE Standard 1471 [8], explicitly ad- 
dresses this issue. Each stakeholder has its own concerns, which require a specific 
view on, and corresponding presentation of, the model. A viewpoint description 
addresses all of these issues. In most cases, this requires abstracting from certain 
details. The composition operator aids in (automated) abstraction of models. 

Consider the following example model for an insurance company that de- 
scribes the realisation of two organisational services offered to customers: ‘Insur- 
ance selling’ and ‘Premium collection’ (see Figure 6). The organisational services 
are realised by the business processes ‘Take out insurance’ and ‘Collect pre- 
mium’, respectively. These business processes make use of application services 
realised by application components. The business process ‘Collect premium’ also 
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composition ® used by ® assignment = used by 
(application interface, business role) 



used by <8> assignment = used by 
(application function) 




assignment ® realisation = realisation 
(application function) 



Customer 


1 h Customer 




1 — *-1 database 



] End-user 
] application 



f Supporting A 
V service J 



] Support 
] application 



assignment ® realisation ® access = access 
(application function, application service) 




Fig. 5. Example of a model. 



Fig. 6. Insurance example. 



uses ‘Insurance collection’ as a supporting organisational service, to obtain the 
information needed to calculate the amount to be charged. 

Assume that for a certain stakeholder it is relevant to know the mutual de- 
pendence of organisational services and business processes on application com- 
ponents. A landscape map [15] with business process and organisation services 
on the axes is a useful view for addressing these concerns. The construction of 
a landscape map is straightforward: an application component C covers a ‘cell’ 
(P, S) defined by business process P and organisational service S if and only if: 
(1) C is ‘used by’ P, and (2) P ‘realises’ S. 

A useful intermediary step to produce such a landscape map is to derive 
the needed implicit relations between application components and business pro- 
cesses, by abstracting from the application services (and, for the ‘used by’ re- 
lation between the ‘Accounting’ component and the ‘Collect premium’ process, 
also from the intermediary business process and organisational service). When 
applying the composition operator on the model in Figure 6, these implicit re- 
lations can easily be derived, yielding the model in Figure 7. In this case, this 
results in the landscape map as shown in Figure 8. 




Fig. 7. Derived ‘used by’ relations. 



Fig. 8. Landscape map. 
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Although the above example is fairly simple, the automated construction 
of landscape views can obviously be performed on arbitrary complex models. 
Moreover, the use of the composition operator for the abstraction from details 
in a model can be used as an intermediary step for a wide variety of visualisation 
and analysis techniques. 

5 Related Work 

The composition rule we present in this paper can, among other things, be ap- 
plied for automated abstraction and complexity reduction of models for, e.g., the 
creation of stakeholder-oriented visualisations. In this section we refer to several 
papers that focus on such transformational processes and/or semantics that can 
be applied to diagrammatic specifications of models for automatic abstraction. 

ArchiMate is ultimately a visual modelling language and we found several rel- 
evant approaches in the area of visual modelling languages. There is the growing 
interest of the visual language community in metamodelling approaches for the 
definition (in UML) of the semantics of visual languages [3]. Most of these ap- 
proaches do not map the original model elements into a symbolic form, but rather 
into a graphical form (usually that of graphs composed of complex attributed 
graphical-type symbols - see [3] . Graph transformations are then defined and ap- 
plied to this graphical from. Graph transformations and graph rewriting systems 
are used for instance in GenGEd [1] and Diagen [12, 10] to produce progressively 
more abstract and simple diagrams. In contrast, our approach provides support 
for a transformational process that applies to the symbolic form of the model. 
The result of this process is then visualised by means of a mapping to graphical 
constructs. 

Another interesting approach that relates to our idea regarding the defini- 
tion of an operator on the metamodel relation set is that of Holt [7]. Holt is 
concerned with architectures that can be characterised as directed graphs. In 
such architectures, the graph nodes represent entities (e.g. procedures, mod- 
ules and subsystems), while the various types of relations between entities (e.g. 
‘calls’, ‘accesses’, ‘imports’, ‘includes’, and ‘uses’) are mapped to the ‘typed’ or 
‘coloured’ edges of the graph. Further on, Holt shows how a binary relational 
algebra (cf. Tarski [14]; Schmidt and Strohlein [13]) can be used to give con- 
straints among the relations in architecture. In a completely different domain 
(geographical applications), a binary relational algebra [14, 11] is also referenced 
by Egenlrofer [5] as a basis for the definition of a composition table of a number 
of eight different topological binary relations. The idea of building such a com- 
position table is also present in our approach (see Table 1). We have explicitly 
formalised this composition table, in the form of a composition operator that 
can be used at both metamodel and model level. 

6 Summary and Conclusions 

In this paper a generic approach for deriving an operator for composition of 
relations in an architectural description language is presented, given the archi- 
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tectural semantics of the underlying metamodel. We applied this approach to 
a simplified version of the ArchiMate metamodel, resulting in a composition 
operator that can be expressed in closed form. This operator has a number of 
desirable properties: it is closed with respect to the set of relations and associa- 
tive. The associativity allows for the construction of the transitive closure of the 
ArchiMate metamodel with respect to the relations. 

We have shown in Section 3 how the composition operator (S)# yields the 
transitive closure of a metamodel, M*. This transitive closure extends the set 
of possible relations between concepts, thus providing more modelling flexibility. 
In Section 3.2, we have extended the result to existing models by deriving an 
operator ®q. We have proved that, given a model that conforms to M*, the 
composition of relations in that model results in a model that conforms to M* . 
Note that this includes models that conform to M. Figure 9 summarises this. 



m 



■ 



n 













Fig. 9. Summary. 



The composition operator can be brought to action in several relevant appli- 
cations of architectural models. In this paper, we have presented two examples. 
First, it is now formally allowed to abstract from certain details by omitting 
intermediary concepts; thus, the operator creates more freedom in the construc- 
tion of formal models. In practice most architects create formal models only for 
specific domains, and intuitively construct models that cross domains. With the 
approach suggested in this paper, it is possible to create cross domain models re- 
taining a formal meaning, as shown in section 4.1. Second, we demonstrated that 
the composition rule can be successfully applied for the (automated) abstrac- 
tion of complex models, which can form the basis of, e.g., stakeholder-specific 
visualisations. 

We are currently exploring several other application of the composition rule. 
Among these, certain types of analyses appear obvious application areas. For 
instance, complexity reduction of the model may be used as ‘preprocessing’ to 
improve the efficiency of certain analysis algorithms. Also, quantifying relations, 
and showing how composition affects the quantitative attributes, may be used 
in certain types of quantitative analysis. This is a topic of current research. 

Finally, we remark that the formal description or study of metamodel proper- 
ties appears to be quite novel in the context of architecture description languages. 
In our view, metamodels might become much more expressive and powerful if at 
the design phase of the metamodel more attention would be paid to metamodel 
properties. The elegant but powerful property of the composition operator for 
relations has at least convinced us to use the preservation of this property as a 
design principle for changes to our metamodel. 
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Abstract. In this work we introduce event-driven grammars, a kind of graph 
grammars that are especially suited for visual modelling environments generated 
by meta-modelling. Rules in these grammars may be triggered by user actions 
(such as creating, editing or connecting elements) and in its turn may trigger 
other user-interface events. Its combination with (non-monotonic) triple graph 
grammars allows constructing and checking the consistency of the abstract syntax 
graph while the user is building the concrete syntax model. As an example of 
these concepts, we show the definition of a modelling environment for UML 
sequence diagrams, together with event-driven grammars for the construction of 
the abstract syntax representation and consistency checking. 



1 Introduction 

Traditionally, visual modelling tools have been generated from descriptions of the Vi- 
sual Language (VL) given either in the form of a graph grammar [2] or as a meta- 
model [6]. In the former approach, one has to construct either a creation or a parsing 
grammar. The first kind of grammar gives rise to syntax directed environments, where 
each rule represents a possible user action (the user selects the rule to be applied). The 
second kind of grammars (for parsing) tries to reduce the model into an initial symbol 
in order to verify its correctness, and results in more free editing environments. Both 
kinds of grammars are indeed encodings of a procedure to check the validity of a model. 

In the meta-modelling approach, the VL is defined by building a meta-model. This 
is a kind of type graph with multiplicities and other - possibly textual - constraints. 
Most of the times, the concrete syntax is given by assigning graphical appearances to 
both classes and relationships in the meta- model [6]. For example, in the AToM 3 tool, 
this is done by means of a special attribute that both classes and relationships have. In 
this approach the relationship between concrete (the appearances) and abstract syntax 
(the meta- model concepts) is one-to-one. The meta-modelling environment has to check 
that the model built by the user is a correct instance of the meta-model. This is done 
by finding a typing morphism between model and meta-model, and by checking the 
defined constraints on the model. In any case, whereas the graph-grammar approach is 
more procedural, the meta-modelling approach is more declarative. 

In this paper we present a novel approach that combines the meta-modelling and 
the graph grammar approaches for VLs definition. To overcome the restriction of a 
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one-to-one mapping between abstract and concrete syntaxes, we define separate meta- 
models for both kind of syntaxes. In a general case, both kinds of models can be very 
different. For example, in the definition of UML class diagrams [12], the meta-model 
defines concepts Association and AssociationEncl which are graphically represented to- 
gether as a single line. In general, one can have abstract syntax concepts which are not 
represented at all, represented with a number of concrete syntax elements, and finally, 
concrete syntax elements without an abstract syntax representation are also possible. To 
maintain the correspondence between abstract and concrete syntax elements, we create 
a correspondence meta-model whose nodes have pairs of morphisms to elements of the 
concrete and abstract meta-models. 

The concrete syntax part works in the same way as in the pure meta-modelling 
approach, but we define (non-monotonic) triple graph grammar rules [11] to build the 
abstract syntax model, and check the consistency of both kinds of models. The novelty 
is that we explicitly represent the user interface events in the concrete syntax part of the 
rules (creating, editing, connecting, moving, etc.) Events can be attached to the concrete 
syntax elements to which they are directed. In this way, rules may be triggered by user 
events, so we can use graph grammar rules in a free editing system. Additionally, we 
take advantage in the rules of the inheritance structure defined in the meta-model, and 
allow the definition of abstract { triple ) rules [3]. These have abstract nodes (instances 
of abstract classes in the meta-model) in the LHS or RHS. These rules are equivalent to 
a number of concrete rules obtained from the valid substitutions of the abstract nodes 
by concrete ones (instances of the derived classes in the meta-model). We extend this 
concept to allow refinement of relationships. 

As a proof-of-concept, we present a non-trivial example, in which we define the 
concrete and abstract syntax of sequence diagrams, define a grammar to maintain the 
consistency of both syntaxes, and define additional rules to check the consistency of the 
sequence diagram against existing class diagrams. 



2 Meta-modelling in AToM 3 

AToM 3 [6] is a meta-modelling tool that was developed in collaboration with Hans 
Vangheluwe from McGill University. The tool allows the definition of VLs by means of 
meta-modelling and model manipulation by means of graph transformation rules. The 
meta-modelling architecture is linear, and a strict meta-modelling approach is followed, 
where each element of the meta-modelling level n is an instance of exactly one element 
of the level n + 1 [1]. 

Figure 1 shows an example with three meta-modelling levels. The upper part shows 
a meta-metamodel for UML class diagrams, very similar to a subset of the core pack- 
age of the UML 1.5 standard specification. The main difference is that Associations can 
also be refined, and that the types of attributes are specific AToM 3 types. Some of the 
concepts in this meta- metamodel are Power types [10], whose instances at the lower 
meta-level inherit from a common class. This is the case of Class, Association and 
AssociationEnd, whose instances inherit from ASGNode and ASGConnection. Classes 
ATOM3AppearanceIcon, ATOM3AppearanceSegment and ATOM3AppearanceLink are 
special types, which provide the graphical appearance of classes, association ends and 
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AToM3 Meta-metamodel (Partially shown) 




Base Classes for Visual Appearance 
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Sequence Diagrams 
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I instance of Class 



Graph_Object 
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Fig. 1 . Meta-modelling levels in AToM 3 . 



associations. They are also Power types, as their instances inherit from abstract classes 
Entity, LinkSegments and Link. The user can define the visual appearance of these in- 
stances with a graphical editor. Instances of ATOM3AppearanceIcon are icon-like, and 
they may include primitive forms such as circles, lines, text and show attribute values 
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of the object associated with the instance through relationship Appearance. Instances 
of ATOM3AppearanceLink are similar to the previous one, but are associated with two 
ATOM3AppearanceSegment instances, which represent the incoming and outgoing seg- 
ments to the link (which is itself drawn in the centre). Finally, the ATOM3 Attribute class 
implements a special kind of attribute type, which is an instance of itself. In this way 
one can have arbitrary meta-modelling layers. 

The second level in Figure 1 shows a part of the meta-model defined in Figure 4 
(the lower part), but using an abstract syntax form (instead of the common graphical 
appearance of UML class diagrams that we have used in the upper meta-metamodel) 
where we indicate the elements of the upper meta-level from which they are instances. 
Only two classes are shown, ActivationBox and Object, together with the attributes for 
defining their appearances. In AToM 3 , by default, the name of the appearance associated 
with a class or association begins with “Graph-” followed by the name of the class or 
association (that is, the name attribute defined in ModelElement is filled automatically). 
In the case of an AssociationEnd instance, it is similar, but followed by an “S” or “T”, 
depending if the end is source or target. 

Finally, the lowest meta-level shows to the left (using an abstract syntax nota- 
tion) a simple sequence diagram model. To the right, the same model is shown, using 
a visual representation, taking the graphical appearances designed for Graph-Object , 
Graph-ActivationBox, Graph-LifeLine, Graph-LifeLineS and Graph-LifeLineT. Note 
how the graphical forms are in a one-to-one correspondence with the non-graphical 
elements (Object 1, ELI, LLS1, LLT1 and ABoxl). The non-graphical elements can be 
seen as the abstract syntax and the graphical ones as the concrete syntax. Nonetheless, 
as stated in the introduction, the one-to-one relationship is very restrictive. Therefore 
we propose building two separate meta-models, one for the concrete syntax represen- 
tation (whose concepts are the graphical elements that the user draws on the screen) 
and another one for the abstract syntax. Both of them are related using a correspon- 
dence graph. The user builds the concrete syntax model, and a (triple, event-driven) 
graph grammar builds and checks the consistency of the abstract syntax model. These 
concepts are introduced in the following section. 



3 Non-monotonic, Abstract Triple Graph Grammars 

Triple Graph Grammars were introduced by Schiirr [11] as a means to specify trans- 
lators of data structures, check consistency, or propagate small changes of one data 
structure as incremental updates into another one. Triple graph grammar rules model 
the transformations of three separate graphs: source, target and correspondence graphs. 
The latter has morphisms from each node into source and target nodes. These concepts 
can be defined as follows(taken from [11]) 1 : 

Definition 1 (Graph Triple) Let CONC, ABST and LINK be three graphs and 
gs : LINK —> CONC, gt : LINK — > ABST be two morphisms. The resulting graph 
triple is denoted as: CONC LINK ABST. 

1 For space limitations, we have skipped all proofs referred to the constructions we introduce. 
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Morphisms gs and gt represent m-to-n relationships between CONC and ABST 
graphs via LINK in the following way: x £ CONC is related to y £ ABST •<=>■ 
3 z £ LINK | x = gs{z ) and y = gt{z). 

In [ 1 1 ] triple graph grammars were defined following the single pushout [7] (SPO) 
approach and were restricted to be monotonic (its LHS must be included in its RHS). 
In this way, only two morphisms were needed from the RHS of the LINK graph to the 
RHS of the CONC and ABST graphs. Morphisms in LHS are defined thus as a re- 
striction of the morphisms in RHS. Here we use the double pushout approach [7] (DPO) 
with negative application conditions (NAC) in rules and do not take the restriction of 
monotonicity. Hence, we have to define two morphisms from both LHS and RHS of the 
correspondence graph rule to the LHS and RHS of the CONC and ABST graphs. 

Definition 2 (Triple Rule) Let sp = (SL SK SR), cp = {CL CK 

CR) and tp = {TL TK TR) be three rules. NAC = {{NS NC 
NT, n)} is a set of tuples where the first component is a graph triple and n is a triple 
{ns - SL — ■» NS,nc' CL — > NC,Ut: TL — » NT) of injective graph morphisms. 
Furthermore, let Is: CL — > SL, rs: CR — » SR, It: CL —> TL and rt: CR — > TR 
be four graph morphisms, such that they coincide in the elements of CK as follows: 
Vfci £ CK,3k2 £ SK, ls{cl{ki)) = sl{k 2 ) A rs{cr{k{)) = sr{k 2 ) 2 (and analogously 
for the elements ofTK). The resulting triple rule (see Figure 2) is defined as follows: 

, ls,rs lt,rt . 

p = (sp < — cp — » rp, TV AC ). 




Fig. 2. A triple rule. 



Figure 3 shows an example of two triple rules (where the dashed arrows depict 
morphisms Is, rs. It, and rt) with NACs, where only the additional elements to LHS 
and their context have been depicted. NACs have the usual meaning, if a match is found 
in the triple graph (which commutes with the LHS match and n), the rule cannot be 
applied. The kernel parts SK, CK and TK of the rules are not explicitly shown, but 
their elements have the same numbers in LHS and RHS. This is the notation that we 
use throughout the paper. For our purposes, we need to extend the previous definition 
of triple grammars to include attributes. This can be done in the way shown in [11], 



2 Which is equivalent to 3cs : CK SK such that Is o cl = si o cs. 
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Fig. 3. An Example with two Triple Rules. 



For the approach to be useful in meta-modelling environments, graphs must be con- 
sistent with a meta-model. We model this by defining typing morphisms between graphs 
and type graphs. We use the concept of type graph with inheritance 3 as defined in [3]: 

Definition 3 (Type Graph with Inheritance, taken from [3]) A type graph with inheri- 
tance is a triple ( TG , I, A) of graphs TG and I sharing the same set of nodes N, and 
a set A C TV, called abstract nodes. For each node n in I the inheritance clan is defined 
by clani(n) = { n! £ N \ 3 path n ' — > n in 1} where path of length 0 is included, i.e. 
n £ cl an i (n). 

For the typing of a graph triple, we have to define meta-models for the CONC, 
ABST and LINK graphs. Additionally, as LINK has morphisms to CONC and 
ABST, we have to include information about the valid morphisms in the meta- model 
for the LINK graph. Thus, we define a meta-model triple in the following way: 

Definition 4 (Meta-model triple ) A meta-model triple is a triple of type graphs with 
inheritance, together with two morphisms (cs and ct) between nodes of one of the type 
graphs to the other two: MMT= {{TG CONC , I CONC , A CONG ), (TG LINK , I LINK 
,A LINK ),(TG ABST ,I ABST ,A ABST ),cs,ct) where cs: TG link - TG CONC 
and ct : TG LINK - TG abst 

Figure 4 shows an example meta-model triple, which in the upper part (abstract syn- 
tax) depicts a slight variation of the UML 1.5 standard meta-model proposed by OMG 
for sequence diagrams. We have collapsed the triples (TG, /, A) into a unique graph, 
where the I graph is shown with hollow edges (following the usual UML notation) and 
the elements in A are shown in italics. 

The lower meta-model in the figure declares the concrete appearance concepts and 
their relationships. The elements in this meta-model are in direct relationship with the 
graphical forms that will be used for graphical representation. As Figure 1 showed, 

3 In the following, we indistinctly use the terms “type graph” and “meta-model”, although the 
latter may include additional constraints. 
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we allow the refinement of relationships, and this is shown with the usual notation for 
inheritance, but applied to relationships (arrows in the diagram). This is just a notation 
convenience, because each relationship (arrow) shown in Figure 4 is indeed an instance 
of class Association in the upper meta-model in Figure 1 . In this way, the inheritance 
concept developed in [3] is immediately applicable to refinement of relationships. 

The correspondence meta-model formalizes the kind of morphisms that are allowed 
from nodes of types CorrespondenceMessage and CorrespondenceObject. As it is de- 
fined, the declared morphism types in cs and ct are not “inherited” through I LINK in 
the correspondence graph meta-model. 




Fig. 4. An Example Meta-model triple. 



Triple rules must be provided with typing morphisms to the meta-model triple. As 
in [3] we use the notion of clan morphism from graphs to type graphs with inheritance. 

Definitions (Clan Morphism, taken from [3]) Given a type graph with inheritance 
(TG,I,A) and graph G, type' : G — > TG is a clan-morphism, if for all e £ Ge 
type' N o sc(fi) £ clani(sTG 0 iype^(e)) and similar for to- 

We can define typed graph triples in a similar way as typed rules were defined in [3], 
but constraints regarding the morphisms of the correspondence graph should also be 
given. Additionally, we can define abstract triple rules by allowing the appearance of 
abstract nodes in LHS of each rule. If an abstract node appears in the RHS, then it must 
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also appear in the LHS. An abstract rule is equivalent to a number of concrete rules 
where each abstract node is replaced by any concrete node in its inheritance clan. For 
the application of this concept here, first note that an abstract triple rule is equivalent to 
the combination of all its concrete subrules. Additionally, some of these combinations 
may be not valid, because of invalid morphisms between the resulting concrete rules of 
the correspondence graph and the source and target graphs. 



Definition 6 (Typed Graph Triple) A graph triple typed by a meta-model triple MMT 

_ (( TG CONC j conc A conc ) (TG link i link a link ) ( TG ABST i abst 



A ABST ),cs,ct) is depicted by TRIG mmt = ( CONC G- LINK GlL, ABST, 
typeci type L , type a) where the last three components are typing clan morphisms from 



CONC, LINK and ABST to the first three components of MMT in which the fol- 



lowing conditions hold: \/l £ LINK typec{gs{l)) £ danjcoNc^s^ypeL^))) and 
typeA(gt(l)) £ clan.jABST (ct(typeL(l))) 



If the image of any element of the triple graph belongs to some of the A sets, the 



typing is called abstract, otherwise it is called concrete. 



Definition 7 (Abstract Triple Rule) A triple rule typed by a meta-model triple MMT 

Is t* s It rt 

(defined as before) is depicted by TRIPmmt = (sp G— cp —G tp , N AC, type sp , 
type cp , typetp ) where type sp is a triple of clan morphisms ( type g p , typef p and typef p ) 

from SL, SK and SR (sp = ( SL GG 5' A' -G SR)) to TG S (and similar for 
type cp and typet p )■ Additionally, NACs are also typed as follows: N AC = {{NS GG 
NC GG NT, n, type N )} is a set of tuples where the first two components are defined 
as in definition 2 and type N is a triple of clan morphisms (typeg , type q, type j, ) from 
the graph triple to TG S , TG C and TG* , which forms a typed graph triple with the first 
component (see definition 6). 

The following conditions hold for sp: 

- type^ p o si = type^ p = typef p o sr ( typing of preserved elements do not change). 

- typef p N {R' sp N ) fl A s = 0, where R’ sp N := SRn — srN(SK N ) (new nodes in 
RHS are not abstract) 

- type g o ns < type{ p for all (N, n, type N ) £ N AC (where <is the type refinement 
relationship [3]) ( typing for NACs is finer than the corresponding elements in LHS ) 

And analogously for cp and tp. As in previous definition, fin £ C L ,type{t p {ls{n)) £ 
clan jconc (cs{typec p {n))) andtypet p {lt(n )) £ clanjABST {ct{type{t p {n))) (andanal- 
ogously for CK and CR) 



Once we have defined the basic concepts regarding graph rules, next section presents 
event-driven grammars, which we use in combination with abstract triple rules in order 
to build the abstract syntax model associated with the concrete syntax. They are also 
useful for consistency checking, as we will see in section 5. 



4 Event-Driven Grammars 

In this section, we present event-driven grammars, as a means to formalize some of 
the user actions and their consequences when using a visual modelling tool. We have 
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defined event-driven grammars to model the effects of editor operations in AToM 3 [6], 
although other tools could also be modelled. The actions a user can perform in AToM 3 
are creating, editing and deleting an entity or a connection, and connecting and discon- 
necting two entities. All these events occur at the concrete syntax level. 

The main idea of event-driven grammars is to make explicit these events in the 
models. Note how this is very different from the syntax directed approach, where graph 
grammar rules are defined for VL generation. In these environments the user chooses 
the rule to be applied. In our approach, the VLs are generated by means of meta- 
modelling, and the user builds the model as in regular environments generated by meta- 
modelling {free-hand editing ). The events that the user generates may trigger the exe- 
cution of some rules. In our approach, rules are triple rules and are used to build the 
abstract syntax model and to perform consistency checkings. 

We have defined a set of rules (called event-generator rules, depicted as evt in Fig- 
ure 5) that models the generation of events by the user. Another set of rules (called 
action rules, depicted as sys-act in Figure 5) models the actual action triggered by the 
event (creating, deleting entities, etc.), and finally, an additional set of rules (called con- 
sume rules, depicted as del in Figure 5) models the consumption of the events once the 
action has been performed. The VL designer can define his own rules to be executed af- 
ter an event and before the execution of the action rules (depicted as pre in Figure 5), or 
after the action rules and before the consume rules (depicted as post in Figure 5)). These 
rules model pre- and post- actions respectively. In the pre-actions, rules can delete the 
produced events, if certain conditions are met. This is a means to specify pre-conditions 
for the event to take place. Additionally, in the post-actions, rules can delete the event 
and undo its actions, which is similar to a post-condition. The working scheme of an 
event-driven grammar is shown in Figure 5. All the sets of rules, (except the ones in evt, 
which just produce a user event) are executed as long as possible. 



Mi 



• M, 



evt 



* lVlevt — pre 



Fig. 5. Application of an event driven grammar with user-defined rules. 



In the example presented in this paper, models (Mj, M evt , M evt - pre , M act , 
M act-post, and Mf in Figure 5) are indeed typed graph triples. In this way, the set 
of rules evt, sys — act and del are applied to the CONC graph, which represents the 
concrete syntax. In the example, rules in pre and post are abstract triple rules, used to 
propagate the changes due to the user-generated events to the abstract syntax model 
(. ABST graph). 

Figure 6 shows the AToM 3 base classes for the concrete syntax. As stated before, 
all concrete syntax symbols inherit either from Entity (if it is an icon-like entity) or 
from Link (if it is an arrow-like entity). Both Entity and Link inherit from VisualObject, 
which has information about the object’s location (x and y) and about if it is being 
dragged (selected). Links are connected to Entities via Segments', these can go either 
from Entities to Links ( e2l ) or the other way around (I2e). 
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Fig. 6. AToM 3 base classes for concrete syntax objects and user events. 



Some of the classes in Figure 6 model the events that can be generated by the user. 
All the events can be associated to a VisualObject. Some events have additional infor- 
mation, such as CreateEvent , which contains the type of the VisualObject to be created, 
and its position. The MoveEvent contains the position where the object has been moved. 
When connecting two Entities , two ConnectEvent objects are generated, one associated 
to the source and other one associated to the target. ErrorEvent signals an error asso- 
ciated with a certain object, AToM 3 presents the text of the error and highlights the 
associated object. Finally, the UserEvent class can be used to define new events. 
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Fig. 7. Some of the event-generator rules. 



Figure 7 shows some of the event-generator rules (depicted as evt in Figure 5), 
which model the generation of events by the user. The Create rule is triggered when the 
user clicks on the button to create a certain entity, and then on the canvas. The type of the 
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object to be created is given by the button that the user clicks, and the x and y coordinates 
by the position of the cursor in the canvas. In AToM 3 , a button is created for each non- 
abstract class in the meta-model. The Delete rule is triggered when the user deletes an 
object. The type of the object to be deleted is obtained by calling the getType function on 
node number one. This is a function which is available in Python (the implementation 
language of AToM 3 ) and returns the actual type of an object. Finally, the Connect rule 
is invoked when the user connects two Entities. In AToM 3 this is performed by clicking 
in the connect button and then on the source and the target entities. AToM 3 infers (with 
the meta-model information) the type of the subclass of Link that must be created in 
between. If several choices exist, then the user selects one of them. The type is then 
passed as a parameter of the rule, and the corresponding creation event is generated. 
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Fig. 8. Some of the action rules. 



Figure 8 shows some of the rules that model the actual execution of the events 
(depicted as sys-act in Figure 5). The first rule models the actual creation of an instance 
(subclass of ASGNode , see Figure 1 ), together with its associated visual representation 
(whose type name is the same as the non-visual instance, but starting by “ Graph J’ and 
is a subclass of Link). The three following rules model the execution of a delete event. 
In the first case (DeletellnConnectedObject rule), the object has no connections. In the 
second case (DeleteConnectedEntity rule), the icon-like object has connections, so a 
delete event is sent to the connected link, and the segment is erased. The third case 
(DeleteConnectedLink rule) models the deletion of a link (the “centre” of an arrow- 
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like graphical form), which also deletes the associated segment. Please note that all the 
rules are executed as long as possible (see Figure 5). The Move rule simply modifies 
the position attributes of the object. Finally, the Connect rule models the connection 
of a link to two entities. Note, that rule Connect in Figure 7 generates a CreateEvent 
for the link, so rule Create in Figure 8 is executed first. The rule creates the link with 
the correct type. Next, rule Connect in Figure 8 can be applied, as classes Entity and 
Link are the base classes for all graphical objects. Note how the appropriate types for 
the segments in between links and entities are obtained (from the AToM 3 API) through 
function TypeOf which searches the information in the meta-model. Finally, a last set 
of rules (not shown in the paper) models the deletion of the events. 



5 Example: Sequence Diagrams 

As an example of the techniques explained before, we have built an environment to de- 
fine UML sequence diagrams. By means of meta-modelling we define the abstract and 
concrete syntax of this kind of diagrams, as well as the correspondence relation between 
their elements (see meta-model triple in Figure 4). Starting from this triple meta-model, 
AToM 3 generates a tool where the user can build models according to that syntax. The 
user creates the diagrams at the concrete syntax level, therefore some automatic mech- 
anism to generate the abstract syntax of the diagrams and support its mutual coherence 
has to be provided. With this aim we have built a set of event-driven rules triggered by 
user actions. Additionally, another set of triple rules check the consistency between the 
sequence diagram and existing class diagrams. Both set of rules are presented in the 
following subsections. 

5.1 Abstract and Concrete Syntax of Sequence Diagrams 

These rules manage the creation, edition and deletion of Objects , the creation, edition 
and deletion of Messages, and the creation and deletion of object Life Lines. The graph- 
ical actions that do not change the diagram abstract syntax (like creating an Activation 
Box) do not need the definition of extra event rules apart from the ones provided by 
AToM 3 (see Figures 7 and 8). 

Rules for the creation, edition and deletion of Objects are the simplest of the set. 
These rules create, edit and delete Objects at the abstract syntax level (once the user 
generates the corresponding event at the concrete level). Objects at the abstract syntax 
are related to the concrete syntax Objects (which received the user event) through an 
element in the correspondence graph. Rules for creating objects (both post- actions, see 
Figure 5) are shown in figure 3. The rule on the left creates the object at the abstract 
syntax level, while the rule on the right connects (at the abstract syntax level) the object 
with its corresponding class. If the rule on the right cannot be applied, it means that the 
object class has not been created in any class diagram. This inconsistency is tolerated 
at this moment (we do not want to put many constraints in the way the user builds the 
different diagrams), but we have created a grammar to check and signal inconsistencies, 
including this one. The grammar is explained in the next subsection and can be executed 
at any moment in the modelling phase. For the deletion of an object (rules not shown in 
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the paper), we ensure that it has no incoming or outgoing connection. This is done by 
a pre- condition rule (not shown in this paper) that erases the delete event on an object 
and presents a message if it has some connection. 

The creation of a message is equivalent to connecting two elements belonging to 
the concrete syntax (ConcreteElement, see Figure 4) by means of a relationship of type 
AbsMessage. Obviously users cannot instantiate neither abstract entities nor abstract re- 
lationships, but only concrete ones. Therefore, at the user level the action to create mes- 
sages includes three concrete cases: the connection of two Activation Boxes by means 
of a Message relation, the connection from an Activation Box to an Object by means 
of a createMessage relationship, and the connection from a Start Point to an Activation 
Box by means of a startMessage relation. The event rules for managing these three con- 
crete cases are very similar except for the entities and relationships participating in the 
action. That is, we should have a first rule to create a Message relationship if its source 
and target are activation boxes; a second identical rule except for the relationship type 
(createMessage) and the target of the relationship (Object); and a third similar rule ex- 
cept for the relationship type (StartMessage) and source (Start Point). Since the three 
rules have the same structure, we use an abstract rale to reduce the grammar size. In 
Figure 9 we show the abstract rule compressing the first and third concrete rules men- 
tioned above. We have used abstraction in many other rules, which highly reduces the 
total amount of rules. The rule in Figure 9 generates the abstract syntax of a new mes- 
sage created by the user, establishing a morphism between the concrete syntax of the 
new message (graphical appearance) and its respective abstract syntax. In this particu- 
lar case the message concrete syntax is related to more than one abstract syntax entity: 
three abstract syntax entities (one Message, one Stimulus and one Action)) are graphi- 
cally represented using a single symbol on the concrete syntax. On the other hand, the 
same event rule has to process the relationship between the newly created message and 
the rest of the model. In this way the successor, predecessor and activator messages of 
the created one have to be computed, as well as the objects sending and receiving the 
message. Additionally, we have to check if the new message activates in its turn another 
block of messages. We have broken down the creation event in a set of 6 user-defined 
events, each performing one step in the process. Thus the number of rules is reduced 
and the processing is easier. 

Other rules (not shown in the paper) calculate the predecessor of a message. This 
is the previous one in the same processing block (the activation boxes corresponding 
with a method execution), or none if the message is the first one in the block. A total of 
16 rules have been defined to manage the creation and edition of objects and messages. 
Some other rules, similar to the previous ones, manage the creation and deletion of Life 
Lines. The processing of the event (creation or deletion) triggers the execution of other 
user-defined events, simpler to process. Most of these events are the same as the ones 
generated by rale in Figure 9, therefore reutilization of rules has been possible. Due to 
space limitation, we do not show all the rules, which are 39 in total. 

5.2 Consistency Checking 

Triple rules can be used not only to maintain coherence between concrete and abstract 
syntax, but also to check consistency between different types of diagrams. The present 




Event-Driven Grammars 



67 



Message and startMessage Creation (post- rule) 



LHS: 



RHS: | SynchronousInvocationAction ~ 





Message 


1 




name = " 





| ConespondenceMessage 



ConcreteElement 



Entity 



6 Graph_AdivationBox 



1 3 5 y 

| Entity |— — | ConcreteElement | 2| ActivationBox | 



[ConnectEvent 


[ CreateEvent 


ConnectEvent | 


|which=Source 


[ type= ‘AbsMessage’ 


which=Target J 









UserEvent 
,Type=“Proce s sPredecesof” j 



UserEvent 
Type=“ProcessSuccessoi" 






Graph_ActivationBox 



UserEvent 

T yp e=“Pro c e s sS endei” 



UserEvent 
Type=“ProcessReceiver” 



UserEvent 

T yp e=“Pro c e s sA ctivator” 



UserEvent 

Type=“Delete Activator” 



Fig. 9. Abstract rule for Creating Messages and createMessages. 



work is part of a more general project with the aim to formalize the dynamic semantics 
of UML [8] by means of transformations into semantic domains (up to now Petri nets). 
Before translation, consistency checkings should be performed between the defined 
diagram (in this case a sequence diagram) and existing ones, such as class diagrams. 
Note how, while the user builds a sequence diagram, the previous rules add abstract 
syntax elements to a unique abstract syntax model. In this way, one has a unique abstract 
syntax model and possibly many concrete syntax models, one for each defined diagram 
(of any kind). 

Using simple triple rules, we can perform consistency checkings between the se- 
quence diagram and an existing abstract syntax model, generated by previously defined 
diagrams. For example, we may want to check that the class of the objects used in a 
sequence diagram has been defined in some of the existing class diagrams; if an ob- 
ject invokes a method of another object, the method should have been defined in its 
class, and there should be a navigable relationship between both object classes (see 
Figure 10), and that the invoked method is visible from the calling class. 




Fig. 10. One of the Rules for Consistency Checking. 
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We define consistency triple rules in such a way that their LHSs are conditions 
that are sought in the defined diagram (sequence diagrams in our case), possibly in the 
concrete and abstract parts. NACs are typically conditions to be sought in the existing 
abstract model with which we want to check consistency. If the rule is applied the rule’s 
RHS sends an event of type ErrorEvent to some of the objects matched by the LHS. 

6 Related Work 

At a first glance, the present work may resemble the syntax directed approach for the 
definition of a VL. In this approach one defines a rule for each possible editing action, 
and the user builds the model by selecting the rules to be applied. Our approach is quite 
different, in the sense that we use a meta-model for the definition of the VL. The meta- 
model (which may include some constraints) provides all the information needed for the 
generation of the VL. The user builds the model by interacting with the user interface. 
In our approach we explicitly represent these events in the rules. Rules are triggered by 
the events, but the user may not be aware of this fact. In the examples, we have shown 
the combination of event-driven grammars with triple grammars to build the abstract 
syntax model and to perform consistency checks. 

In the approach of [4], a restricted form of Statecharts was defined using a pure 
graph grammar approach (no meta-models). For this purpose, they used a low level 
(LLG, concrete syntax) and a high level (HLG, abstract syntax) representation. To ver- 
ify the correctness, the LLG has to be transformed into an HLG (using a regular graph 
grammar), and a parsing grammar has to be defined for the latter. Other parsing ap- 
proach based on constraint multiset grammars is the one of CIDER [9]. 

Other approaches for the definition of the VLs of the different UML diagrams, usu- 
ally concentrate either on the concrete or the abstract syntax, but not on both. For ex- 
ample, in [5], graph transformation units are used to translate from sequence diagrams 
into collaboration diagrams. Note how, both kind of diagrams share the same abstract 
syntax, so in our case, a translation is not necessary, but we have to define triple rules 
to build the abstract syntax from the concrete one. 

7 Conclusions 

In this paper we have presented event-driven grammars in which user interface events 
are made explicit, and system actions in response to these events are modelled as graph 
grammar rules. Their combination with abstract triple rules and meta-modelling is an 
expressive means to describe the relationships between concrete and abstract syntax 
models (formally defined through meta-models). Rules can model pre- and post- con- 
ditions and actions for events to take place. Furthermore, we can use the information in 
the meta-models to define abstract rules, which are equivalent to a number of concrete 
ones, where nodes are replaced by each element in its inheritance clan. In this work, we 
have extended (in a straightforward way) the original work in [3] to allow refinement 
of relationships. 

The applicability of these concepts has been shown by an example, in which we 
have defined a meta-model triple for the abstract and concrete syntax of sequence dia- 
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grams (according to the UML 1.5 specification). Additionally, we have presented some 
rules to check the consistency of sequence diagrams models with an existing abstract 
syntax model, generated by the previous definition of other diagrams. 

Regarding future work, we want to derive validation techniques for triple, event- 
driven grammars. We also plan to use triple graph grammars to describe heuristics for 
the creation of UML diagrams. For example, if the user creates an object in a sequence 
diagram which belongs to a non-existing class, one option is to raise a consistency 
warning. Other possibility is to automatically derive the concrete syntax of a class dia- 
gram with the information of the abstract syntax (classes, methods, etc.) generated by 
the sequence diagram. 
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Abstract. Biochemical pathways, such as metabolic, regulatory, and 
signal transduction pathways, constitute complex networks of functional 
and physical interactions between molecular species in the cell. They are 
represented in a natural way as graphs, with molecules as nodes and 
processes as arcs. In particular, metabolic pathways are represented as 
directed graphs, with the substrates, products, and enzymes as nodes and 
the chemical reactions catalyzed by the enzymes as arcs. In this paper, 
chemical reactions in a metabolic pathway are described by edge relabel- 
ing graph transformation rules, as explicit chemical reactions and also 
as implicit chemical reactions, in which the substrate chemical graph, 
together with a minimal set of edge relabeling operations, determines 
uniquely the product chemical graph. Further, the problem of construct- 
ing all pathways that can accomplish a given metabolic function of trans- 
forming a substrate chemical graph to a product chemical graph using a 
set of explicit chemical reactions, is stated as the problem of finding an 
appropriate set of sequences of chemical graph transformations from the 
substrate to the product, and the design of a graph transformation sys- 
tem for the analysis of metabolic pathways is described which is based 
on a database of explicit chemical reactions, a database of metabolic 
pathways, and a chemical graph transformation system. 



1 Introduction 

Biochemical pathways, such as metabolic, regulatory, and signal transduction 
pathways, constitute complex networks of functional and physical interactions 
between molecular species in the cell. They are represented in a natural way as 
graphs, with molecules as nodes and processes as arcs. In particular, metabolic 
pathways are represented as directed graphs, with the substrates, products, and 
enzymes as nodes and the chemical reactions catalyzed by the enzymes as arcs. 
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Chemical descriptions in a metabolic pathway can be made at different levels 
of resolution: a molecular descriptor uniquely identifies a molecule in a chem- 
ical database; a molecular formula indicates the number of each type of atom 
in a molecule; a constitutional formula or chemical graph also indicates which 
pairs of these atoms are bonded; and a structural formula also indicates those 
stereochemical distinctions that are required to uniquely identify a molecule. 

In a detailed representation of metabolic pathways, at the level of the consti- 
tutional formula or the structural formula, structural change of chemical reac- 
tions can be modeled by superimposing the reactant and the product to match 
up the atoms and bonds that are unchanged in the transformation. A formalism 
called imaginary transition structures was introduced in [8-10] to model chemical 
reactions, where the chemical graphs representing the reactions’ substrate and 
product are superimposed topologically, and the bonds are then distinguished 
and classified into three categories: out-bonds (bonds appearing only in the sub- 
strate molecules), in-bonds (bonds appearing only in the product molecules), and 
par-bonds (bonds appearing in both the substrate and the product molecules). 
Imaginary transition structures can be seen as double-pushout transformation 
rules [3] over chemical graphs: the left-hand side, context, and right-hand side 
are chemical graphs with set of labeled nodes corresponding to the atoms in its 
molecules; the left-hand side graph has edges representing out-bonds, the con- 
text graph has edges representing par-bonds, and the right-hand side graph has 
edges representing in-bonds. This is, essentially, the view of chemical reactions 
advocated in [1], 

In this paper, chemical reactions in a metabolic pathway are described by 
edge relabeling graph transformation rules, as explicit chemical reactions and 
also as implicit chemical reactions, in which the substrate chemical graph, to- 
gether with a minimal set of edge relabeling operations, determines uniquely the 
product chemical graph. 

Further, the problem of constructing all pathways that can accomplish a 
given metabolic function of transforming a substrate chemical graph to a product 
chemical graph using a set of explicit chemical reactions, is stated as the problem 
of finding an appropriate set of sequences of chemical graph transformations from 
the substrate to the product, and the design of a graph transformation system 
for the analysis of metabolic pathways is described, which is based on a database 
of explicit chemical reactions, a database of metabolic pathways, and a chemical 
graph transformation system. 

The rest of the paper is organized as follows. Chemical reactions are viewed 
in Section 2 as edge relabeling graph transformation rules, where both explicit 
and implicit chemical reactions are introduced. In Section 3, two related sets 
of axioms on the structure of metabolic pathways are reviewed and the graph 
transformation problem of analyzing a metabolic pathway is discussed. Feasi- 
ble reaction pathway axioms model networks of chemical reactions that follow a 
series of accepted first principles and conditions, while combinatorially feasible 
reaction pathway axioms relax some of these conditions. Finally, some conclu- 
sions are outlined in Section 4. 
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2 Modeling Chemical Reactions 

A chemical reaction is the change produced by two or more molecules acting 
upon each other. In a chemical reaction, substrate molecules are transformed 
into product molecules, often in the presence of a catalyst. For simplicity, we 
shall assume henceforth that the catalysts of a reaction are part of both its 
substrate and product. 

Example 1. Consider, for example, the acidic hydrolysis of ethyl acetate, which 
is described by the following equation: 

CH 3 COOCH 2 CH 3 + H 2 0 + HC1 — > CH 3 COOH + CH 3 CH 2 OH + HC1 

The substrate of the reaction, ethyl acetate (CH 3 COOCH 2 CH 3 ) and water 
(H 2 0), is transformed into acetic acid (CH 3 COOH) and ethanol (CH 3 CH 2 OH) 
in the presence of a catalyst, hydrochloric acid (HC1). 

It is usual to describe molecules as graphs, with nodes representing the atoms, 
each one of them labeled by the name of the corresponding element, and edges 
representing the bonds, with a positive weight describing the order of the bond 
(1 for a single bond, 2 for a double bond, etc.). A set of molecules is consequently 
described by the disjoint union of the graphs representing them. For reasons that 
will be clear below, we shall allow the existence in these graphs of one or more 
edges labeled 0: they should be seen as non-existent. Let us call these graphs 
representing sets of molecules chemical graphs. 

Definition 1 . A chemical graph is a weighted graph (V,E,p), where (V,E) 
is an undirected graph (without multiple edges or self-loops) all whose nodes are 
labeled by means of chemical elements, and p : E — > N is a weight function. The 
valence of a node in a chemical graph is the total weight of the edges incident 
to it. 

Chemical reactions consist of breaking, forming and changing bonds in sets of 
molecules. Therefore, a chemical reaction can be represented by the transforma- 
tion of the chemical graph representing the reaction’s substrate into the chemical 
graph representing the product. This transformation will satisfy a set of specific 
conditions. First, the number and type of the atoms in the substrate and the 
product must be the same, and therefore the transformation must induce the 
identity on the set of labeled nodes. Besides, and for simplicity, we shall restrict 
ourselves in this paper to chemical reactions where each individual atom has 
the same valence in the substrate and in the product: from the point of view of 
graphs, this corresponds to ensure that the total weight of edges incident to each 
node remains constant after the transformation. In a more general setting we 
could simply impose the conservation of the total number of valence electrons, 
which would correspond to the conservation of the sum of all edges’ weights. 

A systematic study of organic chemical reactions was made in [8-10], where a 
formalism called imaginary transition structures was introduced to model chem- 
ical reactions. The imaginary transition structure of a given reaction is a struc- 
tural formula in which, using our language, the unweighted chemical graphs 
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representing the reactions’ substrate and product are superimposed topologi- 
cally, and the bonds are then distinguished and classified into three categories: 
out-bonds (bonds appearing only in the substrate molecules), in-bonds (bonds 
appearing only in the product molecules), and par-bonds (bonds appearing in 
both the substrate and the product molecules). 

Example 2. The acidic hydrolysis of ethyl acetate from Example 1 can also be 
depicted as a transformation between the chemical graph to the left (representing 
the substrate) and the chemical graph to the right (representing the product) in 
the following diagram: 
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The corresponding imaginary transition structure is shown in Fig. 1, where out- 
bonds are denoted by solid lines crossed by a bar, par-bonds are denoted by plain 
solid lines, and in-bonds are denoted by solid lines crossed by a small circle; for 
simplicity, we have assigned single nodes to groups of atoms like CH 3 and CH 2 
that are not broken in the reaction. 
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Fig. 1. The imaginary transition structure of acidic hydrolysis of ethyl acetate. 



An imaginary transition structure can be seen as a double-pushout transfor- 
mation rule [3] over chemical graphs. The left-hand side, context, and right-hand 
side are chemical graphs with set of labeled nodes corresponding to the atoms 
in its molecules; the left-hand side graph has edges representing out-bonds and 
par-bonds, the context graph has edges representing par-bonds only, and the 
right-hand side graph has edges representing in-bonds and par-bonds. This is, 
essentially, the view of chemical reactions advocated in [ 1 ], 
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An imaginary transition structure can also be seen as a set of edge relabeling 
operations applied to a chemical graph. The substrate of the chemical reaction is 
represented by a chemical graph with set of labeled nodes corresponding to the 
atoms in its molecules and edges representing bonds that exist in the substrate 
or in the product of the reaction. The edges’ weights are then assigned according 
to the definition of chemical graph given above: an edge is labeled 0 if the corre- 
sponding bond does not exist in the substrate (and then, by construction, it must 
exist in the product), and if the bond exists in the substrate, then it is weighted 
according to its order. The chemical reaction is then simply described by rela- 
beling of the edges in this graph: a bond existing in some substrate molecule 
that breaks in the product molecules is relabeled by 0; a new bond appearing in 
some product molecule is assigned the label corresponding to its order; a bond 
that exists both in a substrate molecule and in a product molecule but they are 
of a different order, is relabeled according to its new order; and, finally, labels 
of bonds that are not modified at all by the chemical reaction are not modified 
either. 

This description of a chemical reaction, based on Fujita’s imaginary transi- 
tion structures, motivates the introduction of the notion of an explicit chemical 
reaction. 

Definition 2. An explicit chemical reaction is a structure (V, E , a , 7 r), where 
(V, E , cr) and (V, E, n) are chemical graphs, called the substrate and the prod- 
uct chemical graphs respectively, satisfying the following conditions: 

(i) There is no e £ E such that <r(e) = 7r(e) = 0. 

(ii) For every v £ V, if e i, . . . , e*, are the edges incident to it, then 

cr(ei) -I 1- cr(e fc ) = 7r(ei) H 1- 7r(e fc ) > 1. 

Every imaginary transition structure, and hence every chemical reaction, 
can be represented by means of an explicit chemical reaction (V. E, a , 7r) with 
(V, E, a) and ( V. , E, n) being the chemical graphs describing its substrate and 
product. 

Example 3. Consider again the acidic hydrolysis of ethyl acetate from Exam- 
ples 1 and 2. The diagram in Fig. 2 represents the explicit chemical reaction 
describing it: the left-hand side graph depicts the graph (V,E), and the weight 
functions a and 7r are given in the right-hand side table. 

Application of an explicit chemical reaction to a given chemical graph, con- 
sists of relabeling the substrate by the product within the given chemical graph. 

Definition 3. A chemical graph ( V . , E, p) is a subgraph of a chemical graph 
(V 1 ,E' , p') ifVC V' , E C E' and for all edges e £ E, p(e) ^ //(e). An explicit 
chemical reaction (V, E, a, 7r) can be applied to a chemical graph [V',E',p) 
if (V,E,a) is a subgraph of (V , E' , p). In such a case, the application of 
(V,E,o,tt) to (y',E',p) results in a chemical graph (V' , E' , p'), where //(e) = 
pie) for all edges e £ E' \E and //(e) = 7r(e) for all edges e £ E. 
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Fig. 2. The explicit chemical reaction of acidic hydrolysis of ethyl acetate. 



Now, in general, it will not be necessary to provide the full substrate and 
product edge weight functions cr and 7r of an explicit chemical reaction, to de- 
scribe the corresponding edge labeling transformation. Indeed, it is enough to 
specify the substrate chemical graph and a minimal set of edge relabeling op- 
erations which, when applied to a graph taking into account conditions (i) and 
(ii) in Definition 2, determine uniquely the product chemical graph. Since the 
undirected graph underlying the substrate chemical graph is finite, such a min- 
imal set of edge relabeling operations will always exist, although it need not be 
unique. 



Example A Consider one more time the acidic hydrolysis of ethyl acetate. As it 
can be seen in the table given in Fig. 2, this chemical reaction corresponds to six 
edge relabeling operations (see also Example 2). Now, it can be easily checked 
that each one of these edge relabeling operations, together with the conservation 
of the atoms’ valences and the structure of the underlying undirected graph, en- 
tails the other five ones and hence it describes completely the chemical reaction. 

For instance, assume we relabel from 0 to 1 the edge 8-11, joining a hydrogen 
in the water molecule and the chlorine in the hydrochloric acid. To preserve 
these atoms’ valences, edges 7-8 and 10-11 must be relabeled to 0. And then 
to preserve the oxygen and hydrogen atoms’ valences involved in these two last 
edges, the only possibility is to relabel edges 2-7 and 3-10 to 1. And finally, in 
order to preserve the valences of the carbon and oxygen atoms involved in these 
two edges, it can be checked that the only possibility is to relabel edge 2-3 to 
0: any other relabeling modifies the valence of other atoms bound to the carbon 
or the hydrogen. 
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This leads us to the definition of an implicit chemical reaction. 

Definition 4. An implicit chemical reaction is a finite set R of edge rela- 
beling operations of the form 

r — (A r p, A r 2 , W r?s , W r ^pf 

with A r \ and A r 2 atomic elements and w r , s ,w rtP £ IN such that w rtS ^ w r , P - 
Such an implicit chemical reaction can be applied to a chemical graph 
(V, E, a) when the following conditions are satisfied: 

(a) For every r £ R there is one edge e r in E whose nodes have labels A r ^ and 
A r 2 , whose weight is uy s , emd such that w r ^ p is less than or equal to the 
valence of both these nodes. 

(b) There is one, and only one, chemical graph (V,E,t), with the same nodes 
and edges as ( V,E,a ), such that 

• all its nodes have the same valence as in ( V,E,a ); 

• r(e r ) = Wr tP for every r £ R; 

• if c r(e) = 0, then r(e) ^ 0. 

The result of this application is, then, the chemical graph ( V,E,t ) , and this 
application is said to represent the explicit chemical reaction ( V,E,o,t ). 

Notice that an implicit chemical reaction need not admit an application to 
a given chemical graph, even if it satisfies condition (a). And also that a given 
implicit chemical reaction may admit several applications to a given chemical 
graph, possibly giving different results. It could be interesting to study applica- 
tion and uniqueness conditions for implicit chemical reactions. But we shall not 
consider them here, as we are dealing with already existing chemical reactions, 
which we represent by means of an application to a suitably defined chemical 
graph of an implicit chemical reaction that has been determined beforehand. 

3 Analyzing Metabolic Pathways 

Metabolism can be regarded as a network of chemical reactions catalyzed by en- 
zymes and connected via their substrates and products, and a metabolic pathway 
can be regarded as a coordinated sequence of chemical reactions [4] . The defini- 
tion of a metabolic pathway is not exact, and most pathways constitute indeed 
highly intertwined cyclic networks. In a cell, a pathway’s substrates are usu- 
ally the products of another pathway, and there are junctions where pathways 
meet or cross [13]. For the purposes of this paper, we shall adopt the following 
definition. 

Definition 5. A metabolic pathway is a connected directed graph ( C,R ), 
where C is a set of chemical graphs and R C C x C is a set of explicit chemical 
reactions. The substrate of ( C , R) is the set of chemical graphs S C C such 
that for all (V,E,tt) in S, there is no explicit chemical reaction of the form 
(V,E,o,tt) in R. The product of ( C,R ) is the set of chemical graphs P C C 
such that for all ( V , E , a) £ P, there is no explicit chemical reaction of the form 
(V, E, a, 7 r) in R. 



Analysis of Metabolic Pathways by Graph Transformation 



77 



The analysis of metabolic pathways is motivated by the rapidly increasing 
quantity of available information on metabolic pathways for different organ- 
isms. One of the most comprehensive sources of metabolic pathway data is [19]. 
There are also several databases on metabolic pathways, such as aMAZE [17], 
BRENDA [23], EcoCyc [14], KEGG [12], and WIT [21], These databases contain 
hundreds of metabolic pathways and thousands of chemical reactions, and even 
the metabolic pathway for a small organism constitutes a large network. For 
instance, the proposed metabolic pathway for the bacterium E. coli consists of 
436 compounds (substrates, products, and intermediate compounds) linked by 
720 reactions [5]. 

One aspect of metabolic pathway analysis is flux analysis : the decomposition 
of a metabolic pathway into a complete set of nondecomposable steady state flux 
distributions. Two similar approaches to flux analysis are known which are based 
on the set of elementary flux modes [24] and on the set of extreme pathways [22] . 
In large metabolic networks, however, these approaches are hampered by the 
combinatorial explosion of possible routes: the maximal number of elementary 
flux modes in a metabolic pathway is exponential in the number of reactions, 
substrates, and products [15]. 

Another, complementary aspect of metabolic pathway analysis is pathway 
synthesis', the construction of all pathways that can accomplish a given metabolic 
function, which is: the transformation of a given set of substrates to a given set 
of products. Pathway synthesis belongs in pathway analysis, because it allows 
biologists and biochemists to contrast those metabolic pathways which exist in 
the cell for different organisms, against feasible metabolic pathways obtained by 
synthesis. 

In pathway synthesis, much like in retrosynthetic analysis in organic chem- 
istry [2,26], the target chemical graph is subjected to a disconnection process, 
which corresponds to the reverse of a chemical reaction. As a result, the target 
chemical graph is transformed to a sequence of simpler chemical graphs in a 
stepwise manner, along a path that ultimately leads to simple chemical graphs. 
For a complex target chemical graph, some intermediate chemical graphs may 
undergo further retrosynthetic analysis. Thus, the repetition of this process even- 
tually will result in a hierarchical synthesis tree for the target chemical graph. 

In order to synthesize meaningful metabolic pathways, axioms on reaction 
pathways have been established in [6,25], based on [7]. A first set of axioms, 
the feasible reaction pathway axioms, establish that (Rl) every product is to- 
tally produced by reactions represented in the pathway; (R2) every substrate 
is totally consumed by reactions represented in the pathway; (R3) intermediate 
compounds are entirely produced by previous reactions and completely con- 
sumed by subsequent reactions; (R4) each reaction represented in the pathway 
is defined a priori; (R5) the network representing the pathway is acyclic; and 
(R6) at least one reaction represented in the pathway affects the activation of a 
substrate. 

Among these axioms for feasible reaction pathways, (R4) follows from Defi- 
nition 5. The remaining ones are enforced by the following definition. 
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Definition 6. A metabolic pathway ( C , R) is said to be a feasible reaction 
pathway if it satisfies the following axioms: 

— For each product chemical graph of the form (V, E, p) in P, there is a set 
of explicit chemical reactions R' C R of the form (V 7 , E' , o' , tt') such that 
{V, E, p) is a subgraph of the union of the product chemical graphs (V', E' 1 n') 
in R' . 

— For each substrate chemical graph of the form (V,E,p) in S, there is a 
set of explicit chemical reactions R' C R of the form (V 1 , E' 1 ex', n') such 
that (V, E, p) is a subgraph of the union of the substrate chemical graphs 
(" V',E',<j ') in R' . 

— For each chemical graph of the form ( V , E, p) in C \ (S U P) , there is a set 
of explicit chemical reactions R' C R of the form (V 1 , E' , o' , it') such that 
( V , E, p) is a subgraph of the union of the product chemical graphs ( V E\ it') 
in R' , and there is a set of explicit chemical reactions R" C R of the form 
(V", E " , a" , tt") such that (V, E, p) is a subgraph of the union of the substrate 
chemical graphs (V" , E" , a") in R" . 

— (C, R) is acyclic. 

— There is an explicit chemical reaction of the form ( V , , E, cr, 7 r) in R, with 
{V, E, a) a substrate chemical graph in S C C. 

A second set of axioms, the combinatorially feasible reaction pathway axioms, 
allow one to focus on the combinatorial properties of the network comprising the 
feasible reaction pathways, as the condition imposed by axiom (R5) is relaxed 
except for the cycles formed by the forward and reverse directions of a chemical 
reaction, and the condition imposed by axiom (R6) is discarded. These axioms 
establish that (Tl) every product is represented in the network; (T2) every 
substrate is represented in the network; (T3) each reaction represented in the 
network is defined a priori; (T4) every compound represented in the network 
has at least one path leading to a product of the network; (T5) every compound 
represented in the network is a substrate or a product for at least one reaction 
represented in the network; (T6) a substrate of any reaction represented in the 
network is a substrate of the network if it is not a product of any reaction 
represented in the network; and (T7) each reaction represented in the network 
is either forward or backward, but not both. 

Among these axioms for combinatorially feasible reaction pathways, (T3) and 
(T4) follow from Definition 5. The remaining ones are enforced by the following 
definition. 

Definition 7. Let S' and P' be fixed sets of chemical graphs. A metabolic path- 
way (C, R) is said to be a combinatorially feasible reaction pathway with 
respect to S' and P' if it satisfies the following axioms: 

— P' C C. 

— S' C c. 

— For each chemical graph of the form (V, E , p) in C , there is an explicit chem- 
ical reaction of the form (V 7 , E' 1 cr', it') in R such that either (V,E,p) is a 
subgraph of {V ' , E ' , a'), or (V,E,p) is a subgraph of (V 1 , E' 
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— For each explicit chemical reaction of the form ( V , E, a , i r) in R, ( V , E, a) is 
in the substrate S of (C, R) if there is no explicit chemical reaction of the 
form (V 1 , E' , a' , tt') in R such that ( V,E,a ) is a subgraph of (V 7 , E' , tt'). 

— There are no two explicit chemical reactions of the form (V, E, a, 7 r) and 
{V, E, 7 r, cr) in R. 

Now, in the analysis of metabolic pathways, the problem of constructing 
all pathways that can accomplish a given metabolic function can be stated as 
follows. 

Problem 1 (Synthesis of metabolic pathways). Given a substrate chemical graph 
S', a product chemical graph P' , and a set R' of explicit chemical reactions, 
find one or all feasible or combinatorially feasible metabolic pathways (C, R) 
with substrate S = S' , product P = P ' , and set of explicit chemical reactions 
R C R'. 

Since explicit chemical reactions are edge relabeling graph transformation 
rules, the metabolic pathway synthesis problem can be solved by graph trans- 
formation, as follows. Given a substrate chemical graph S ' , a product chemical 
graph P', and a set R' of explicit chemical reactions, a metabolic pathway (C, R) 
with substrate S = S' , product P = P’ , and set of explicit chemical reactions 
R C R' is given by a set of sequences of chemical graph transformations with 
substrate S and product P. 

A graph transformation system for the analysis of metabolic pathways is 
being developed at the Technical University of Catalonia. The system is based 
on the following three main components: 

— Database of explicit chemical reactions, 

— Database of metabolic pathways, and 

— Chemical graph transformation system. 

The efficient implementation of the chemical graph transformation system 
relies on the CANON method for labeling a molecular structure with canonical 
labels [27-29], in which a molecular structure is treated as a graph with nodes 
(atoms) and edges (bonds) , and each atom is given a unique numerical label on 
the basis of the topology of the molecular structure. 

4 Conclusion 

Chemical reactions in a metabolic pathway are described in this paper by edge 
relabeling graph transformation rules, both as explicit chemical reactions and 
as implicit chemical reactions, in which the substrate chemical graph, together 
with a minimal set of edge relabeling operations, determines uniquely the prod- 
uct chemical graph. On the basis of explicit chemical reactions, the problem 
of constructing all pathways that can accomplish a given metabolic function of 
transforming a substrate chemical graph to a product chemical graph using a 
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set of explicit chemical reactions, is stated as the problem of finding an appro- 
priate set of sequences of chemical graph transformations from the substrate 
to the product. The design of a graph transformation system for the analysis 
of metabolic pathways, based on a database of explicit chemical reactions, a 
database of metabolic pathways, and a chemical graph transformation system, 
is also described. 

The formalism of chemical graphs and explicit chemical reactions is suffi- 
cient to describe most of the about 700 chemical reactions which have come 
to be recognized and referred to by name within the chemistry community [11, 
16,18,20]. Future work includes extending this formalism to take compounds 
formed by ionic (instead of covalent) bonding, stereochemistry, and chirality 
into account, as well as modeling analysis problems upon more complex forms 
of biochemical networks, such as regulatory and signal transduction pathways, 
by graph transformation. 
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Abstract. Chemical reactions can be represented as graph transfor- 
mations. Fundamental concepts that relate organic chemistry to graph 
rewriting, and an introduction to the SMILES chemical graph specifi- 
cation language are presented. The utility of both deduction and un- 
ordered finite rewriting over chemical graphs and chemical graph trans- 
formations, is suggested. The authors hope that this paper will provide 
inspiration for researchers involved in graph transformation who might 
be interested in chemoinformatic applications. 



1 Background 

Few students of organic chemistry realize that there is a formal computational 
notion to the activity of writing down the products of chemical reactions. Graph 
transformation, which has appeared to meet broad interest in computer science, 
remains somewhat unnoticed in chemistry (in a formal sense) despite its obvious 
application. This paper will provide a broad outline of issues relating graph 
transformation to organic chemistry, in an attempt to provoke interest from 
computer scientists to the held of chemoinformatics. 

Before we begin, an overview of organic chemistry is in order. We have de- 
liberately made some omissions in our chemical explanations, in the interest of 
not obscuring the relevance of this paper to researchers involved in graph trans- 
formation. We refer the reader who is interested in learning some of the more 
subtle issues in graph-based representations of molecules (stereochemistry, tau- 
tomerism, aromaticity etc.) to any of the good textbooks [1], and your friendly 
local organic chemist. 

Most physical things are made of molecules, collections of different types of 
atoms linked together via electronic orbitals. This paper will primarily concern 
itself with the construction of organic molecules. Such molecules contain carbon 
atoms in their structure, and are generally of biological origin or are synthesized 
from “salts of the earth” (carbon based compounds obtained from crude oil), 
via chemical reactions. A simple organic molecule is depicted Figure 1, and some 
common reactions are shown in Figure 2. 

Molecules represented as graphs (where the typed “nodes” are atoms and 
typed “edges” represent bonds between atoms) have motivated solutions to 
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H i H 

N H^N^H 

or 

H 

Fig. 1. Two depictions of the same molecule. Organic chemists tend to use the depiction 
on the left, omitting the labels for carbon (C) atoms and implicitly assuming the 
presence of hydrogen (H) atoms from the bond type (note the double bond). Wedged 
and and dashed edges represent bonds projecting out of and into the page, respectively. 
These figures primarily depict molecular connectivity, real molecules are embedded into 
distributions of three dimensional conformations. Atoms generally do not lie in one 
plane, and are tied to each other in a spring-like fashion by chemical bonds. 




Fig. 2. Generalized depictions of three well known reactions a) esterfication b) 
acetylene- azide cycloaddition c) Diels-Alder cycloaddition. The molecules on the left 
hand side of the arrow are termed reactants and those on the right hand side, prod- 
ucts. The asterisk denotes any molecular fragment. Depictions usually omit atoms that 
are not part of the main carbon skeleton, such as the H + (hydrogen cation) and CD 
(chlorine anion) in a). Note how some reactions can give multiple products, as in c). 



The Potential of a Chemical Graph Transformation System 



85 



problems in mathematics, under a field loosely called “mathematical chemistry”. 
Chemical questions have lead to mathematical analogies which have inspired the 
investigation of topics such as constructive/ analytic enumerations, graph canon- 
icalization and (sub[2, 3])graph isomorphism [4]. These results provide a deeper 
understanding of molecular space and have lead to indispensable practical tools 
for machine aided storage, query and analysis of molecular structures. 

Organic reactions can be represented as graph transformations, in fact, any 
chemical reaction can generally be represented by a sequence of applications of 
bond breaking and bond creation (atoms are generally conserved in chemical 
transformations). These bond manipulations physically correspond to sets of 
electrons re-arranging their quantum configurations to the most probable states. 
The precise order of these re-arrangements (which can occur on the order of 
femtoseconds) is the subject of reaction mechanism (see Figure 10), and is a 
fundamental physical concept that practicing chemists use to understand how 
certain products were achieved from a mixture of reactants 1 . From here on, we 
will refer to chemical reactions as the physical “territory” and chemical graph 
transformations as the “map”s which represent them. 

Typically, multiple steps of chemical reactions are composed together (be- 
ginning with a set of starting materials) to produce synthetic routes to more 
complex molecules that have been targeted for construction, which are generally 
either 

1. molecules which have been isolated from paucious natural sources, that have 
medicinal use and need to be synthetically produced on a large scale. 

2. totally new molecules which have some predicted utility. 

This paper will concern itself with the application of graph transformation to- 
wards both of these goals. An example synthetic route determined for an antibi- 
otic is shown in Figure 3. 




1 . /Pr 2 NEt, EtOAc, A 

2. HCO 2 NH 4 , 10% Pd/C 




CBZ-CI 

NaHCOg 




1. nBuLi, THF, -78°C 
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\J- N s^ o 



2. NaN 3 , DMF, 75°C 



O 
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1. 10% Pd/C, H 2 , EtOAc 
* 3 2. Ac 2 0, pyr 



=\ ho °- 



N I 



V 



F F F 

Fig. 3. The synthetic route to an antibiotic compound[5] that goes by the commercial 
name Zyvox. 



The hitherto lack of theoretically founded graph rewriting applied to chem- 
istry may be attributed to the shortage of practical underlying tools for its imple- 
mentation, and its inherent intractability acquired from the requisite subgraph 

1 While chemists generally understand mechanism as the movements of electrons, the 
physical details of how this happens is remarkably complex, and the focus of many an 
investigator. We still cannot easily predict reactions based on first principles alone, 
which explains why new reactions are generally discovered by the experimentalist. 
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isomorphism step. It is important to note that despite theoretical intractability 
of subgraph isomorphism, constraints and heuristics can produce practically effi- 
cient implementations, demonstrated by the fact that chemical structure queries 
are routinely performed using subgraph isomorphism in many chemical database 
systems. 

By considering molecules as expressions and the chemical transformations 
as rewriting rules, we propose the construction of a software system that would 
facilitate the implementation of automated deduction and exhaustive finite un- 
ordered rewriting over molecules and reactions. We outline the significance of 
such systems to organic chemistry in this paper. In the next section we describe 
a well known chemical graph specification language that such a system would 
likely employ. 



2 The SMILES Graph Specification Language 



Organic molecules are generally drawn in the canonical form as shown in this 
paper, a visual language which has evolved since the beginning of chemistry as 
the molecular nature of matter has become better understood. 

A significant amount of historical[6, 7] effort has been spent on designing 
textual descriptions of molecular graphs which could be used for machine aided 
storage, query and analysis. The IUPAC naming system (which is actively refined 
as new molecules are discovered) is the standard naming system which chemists 
generally use in publications for unambiguous descriptions of molecules. IUPAC 
names are generally unwieldy when searching for molecules or even storing them 
in databases. The reader may recognize IUPAC names of organic molecules by 
looking on the ingredients label of many foodstuffs 2 and medicines. 

The simple SMILES [8] language has emerged as the de facto standard for 
representing molecules in databases. While the grammar itself is rather straight- 
forward, the generation of canonical^] SMILES strings for molecules is of im- 
mense utility. The SMILES framework provides canonicalization, a convenient 
query-specification language (SMARTS) and representations for chemical reac- 
tions (SMIRKS). 

The SMILES language contains nomenclature for describing atoms, bonds 
and for branches and closing loops. All atoms are denoted by their standard 
chemical symbols from the periodic table of elements. For example, carbon is 
‘C’. Bonds are denoted as for single bonds, “=’ for double bonds, for 
triple bonds and for aromatic bonds. SMILES attempts to generate graph 
descriptions as small as possible so if a bond is not specified it is assumed to be 
either a single bond or an aromatic bond. Aromatic bonds are problematic in 
nature for most rewriting systems since they are bonds that resonate between 
single and double bonds and must be perceived by the SMILES parsing system. 
A simple SMILES example is shown in Figure 4. 

2 The IUPAC name for the artificial sweetener “aspartame” , is aspartyl-phenylalanine- 
L-methyl ester. 
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Fig. 4. A simple four carbon chain {butane), specified by the SMILES string “CCCC”. 

The SMILES string “[CH3]-[CH2]-[CH2]-[CH3]” could also be used to de- 
scribe the molecule in Figure 4, where hydrogen atoms are listed explicitly. 

SMILES uses brackets to indicate isomers or non-standard configurations of 
atoms. To indicate closures, SMILES uses numbers after the atom designation. 
The SMILES string for a simple cyclic molecule is shown in Figure 5. 



Fig. 5. A four carbon ring {cyclobutane), specified by the SMILES string “C1CCC1”. 

Finally, branches are denoted between parenthesis. The atom after the paren- 
thesis is assumed to be attached to the immediately preceding atom (Figure 6). 



Fig. 6. A branching example in cyclobutanone, specified by the SMILES string 
“C1CC(=0)C1”. 

Note that all of the following SMILES strings are valid for describing the 
molecule in Figure 6 (the last one being the canonical SMILES string): 



SMARTS is an extension of SMILES that basically allows one to query for 
sub-structures in SMILES strings. SMARTS look incredibly similar to SMILES, 
and most SMILES strings are also valid SMARTS. However, SMARTS adds 
AND and OR operations for graph matching. For instance, “[0,S]CCC” will 
match a set of four atoms that have an oxygen or a sulphur bonded to three 
carbons. 
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Finally, SMIRKS is a combination of SMILES and SMARTS used for spec- 
ifying chemical graph transformations. Atom mapping is the main addition for 
describing reactions, with the enumerated labeling of atoms used to describe the 
embedding. A simple example, taken from the Daylight Inc.’s SMIRKS docu- 
mentation, is esterfication of a carboxylic acid, shown in Figure 7. 
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0(Q) 
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Fig. 7. A generalized form 


of the reaction a) in Figure 


2, is described as 



the following SMIRKS string, “ [C : 1] [C : 2] (= [0 :3] ) [0 : 4] [H] . [C : 5] [0 : 6] [H] >>[C:1][C:2] 
(=[0:3]) [0:6] [C:5] . [0:4] ” . 



FROWNS is a free implementation of the SMILES system by one of the 
authors[10]. The FROWNS implementation of SMIRKS, while incomplete, was 
started mainly because of a disconnection between how SMILES and SMARTS 
handle hydrogen atoms. Daylight has provided a wealth of of information on 
the SMILES system [11], and continually improves the SMIRKS model, but the 
authors feel that there may be room for some improvement by considering what 
type of work has been done in the discipline of graph transformation and com- 
bining it with these current approaches in chemical representation. 

3 Aspects of a Chemical Graph Transformation System 

The authors imagine a software system that would be linked to a large database 
of chemical structures (on the order of millions of compounds), reactions, and 
validated synthetic routes where they can be manipulated in an algorithmic and 
efficient fashion. Before the implementation on such a system were to begin, the 
authors feel that careful attention must be paid to the following issues. 

1. Global Context Sensitivity. The entire physical picture of a chemical trans- 
formation is not completely captured by only depicting the movements of 
bonds between atoms. Physical reactions have properties (yeild, rate, cost, 
etc.) which are dependant on the global context which the reaction is per- 
formed in. Almost all chemical reactions are run in liquid solvent (which 
itself can be a mixture of different molecules), the bulk properties of which 
(polarity, viscosity, specific heat, etc.) can greatly affect yield. Temprature, 
the presence of catalyst and the presence of radiation, are all aspects of a 
chemical reaction that are generally reported in any publication where an 
instance of that reaction is used (typically as one step in a multi-step syn- 
thesis) . Any chemical graph transformation system must be able to associate 
such global physical descriptors with chemical graph tranformations. 
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2. Local Context Senstivity. Reactions are generally depicted in such a manner 
where the local molecular context is implied, or is generally understood by 
chemists (as in Figure 2). However, a chemical graph transformation system 
must be able to specify local preconditions under which a particular chemical 
graph transformation can be applied. For instance many potential reaction 
rules cannot be physically applied, due to the three dimensional conformation 
of reactive groups (Figure 8). 




Fig. 8. Transformation a) in Figure 2 is applied once to two different molecules. In 
a) we would predict that the more abundant product (boxed) comes from the intra- 
molecular reaction, as the molecule will more “quickly” react internally with itself than 
with another molecule in solution. In b), the molecule is constrained conformationally 
so that only the inter - molecular product is formed (the graphs shown in reaction 
diagram actually depict physical multi-sets of molecules). 



3. Non- determinism. Organic chemistry is in fact a game of numbers. By blindly 
applying chemical graph transformations we would find no chemical would 
be stable, as there are always routes that could operate on virtually any 
structure. Chemists take advantage of the fact that many potential reac- 
tions are separated by orders of magnitude in terms of rate (or reactivity). 
By applying local and global contexts in such a way that the chemist knows 
the desired reaction will occur orders of magnitude faster than any other 
potential reaction, she can isolate practical yields of intended product (Fig- 
ure 9). Although a pure chemical graph transformation system will provide 
products “up to reactivity” , the system should include provisions or hooks to 
physical simulation that will predict if certain reactions will (not) occur. In 
many cases the outcome of a particular reaction can be tuned to one product 
of a large potential set. A chemical graph transformation system needs to 
be able to backtrack to keep count of all possible outcomes of a reaction, to 
infer productive synthetic routes. 
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Fig. 9. An example[12] of a non-deterministic application of a chemical transformation. 
Either or both carbonyl groups (carbon double bonded to oxygen) can be transformed, 
but under the conditions of the reaction, only one (boxed product) is physically abun- 
dant. The reasons for this observed selectivity in reactivity, are due to both reaction 
conditions (written above the arrow) and the three dimensional conformation of the 
molecule. 



4. Mechanism. Although we have not provided a background on chemical mech- 
anism, we anticipate that a chemical graph transformation system would be 
more useful if reactions could be encoded and applied at different “levels of 
detail” (Figure 10). Students are often encouraged to not simply memorize 
organic transformations, but rather the mechanisms that they go through, 
which allows them to more easily infer the outcomes of particular reactions. 
In a similar sense, the ability to incorparate information about mechanism 
would greatly extend the utility of the system we propose. 
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Fig. 10. A more detailed picture of reaction a) in Figure 2. The double dots above the 
oxygen (O) atom are an explicit representation of an electron pair. The curved arrows 
denote the movement of electron pairs. 



4 Utility of a Chemical Graph Trasformation System 

The references that are provided in this section demonstrate that there are many 
researchers who have implimented and applied graph rewriting over molecular 
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graphs. The systems that are employed in these investigations are rather spe- 
cialized for their particular application, and are not instances of a more general 
graph transformation framework. Despite the fact many day-to-day chemical 
questions are easily specified in terms of graph rewriting, such as: “What are 
all the potential mechanisms that could underly this reaction?”, “What is the 
closest molecule I can make to my target, using only these reactions and materi- 
als?” , “What potential degradation pathways can this molecule undergo?” , such 
a framework does not exist. 

Chemists require a language in which these questions can be specified as eas- 
ily as they are thought of, and a system which can answer these queries efficiently. 
We hope that knowledge from the side of automated deduction, computer aided 
software engineering and visual specification languages can be applied to further 
this goal. We outline the general application areas to a greater depth below. 

4.1 Computer Aided Organic Synthesis (CAOS) 

An immense amount of knowledge in chemistry has been discovered along the 
way when chemists have attempted to construct molecules identical to those iso- 
lated from biological sources. We can consider these target structures as chemical 
theorems and the available set of starting materials and reactions which can be 
applied to them as chemical axioms. The computational process by which one 
tries to apply these axioms to physically construct ( chemically prove) the target 
structures can be considered chemical deduction. Not only is it a computationally 
difficult task, but the physical constraints introduce uncertainty which requires 
that a large number of laborious experiments be done. 

The computational complexity of finding a synthetic route to a target is 
equivalent to the semigroup word problem and is thus Turing undecidable[13]. 
By adding constraints such as bounding the number of applications and/or re- 
stricting the set of rewrite rules to only increase the size of the intermediates, 
the problem is equivalent to the tiling problem and is NP-complete[13]. These 
theoretical complexity bounds suggest why CAOS is difficult and not widely 
used, even modulo the difficulties airising from the physical unpredictability of 
chemical transformations. 

The intuitive approach to chemical deduction, first formally outlined by E. 
J. Corey, is called retrosynthetic analysis[14t\. The chemist recalls approximately 
130 chemical transformations and attempts to mentally disconnect the target 
molecule into smaller pieces by applying the reversed transformation (the left 
and right hand side of the rule are switched, as in Figure 11), until he is left 
with starting materials that can be obtained from nature. A flexible chemical 
rewriting system should be able to reverse and apply rules, in addition to having 
an efficient mechanism by which preconditions can be specified and tested. 

Current CAOS software systems (LHASA[15], Syngen, SYNCHEM, etc.) rely 
on expert-system based apporaclres, and differ significantly in terms of the degree 
of interactivity, implemented constraints and supporting knowledge bases. It is 
important to note that none of these systems are considered widely used tools by 
practicing chemists[13]. While it is not our intent to be dismissive of these efforts 
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Fig. 11. A retrosynthetic transformation (denoted with the double lined arrow), using 
reaction b) from Figure 2. This is only a formal construction, the actual reaction 
happens in the forward sense[16]. 



(despite their impressive extensions, such as LHASA’S LCOLI), these tools are 
not instances of a more general framework based on graph transformation and 
appear disjoint to general notions of deduction over graph grammars. 

4.2 Exhaustive Finite Unordered Rewriting for Virtual Libraries 

A common approach to the solution of computationally difficult problems is the 
strategy of “divide and conquer” , by dividing a problem into easier sub-problems, 
and combining the resulting solutions to produce a solution of the original in- 
stance. Divide and conquer is an apt description of the underlying approach to 
search the physically intractable large space of small organic molecules (esti- 
mated to be between[17] 10 62 and 10 63 members) via combinatorial chemistry. 
By employing parallel synthetic routes and restricting the set of reactions and 
starting materials, chemists aim to quickly produce libraries of tens of thousands 
of different compounds that can be tested for some property. The reactions and 
starting materials should not only be practical and efficient, but provide libraries 
which maximize diversity, so as to sample as large a volume of molecular space as 
possible with the fewest number of compounds. By using a database of starting 
materials and encoding chemical transformations as graph rewrite rules, those 
rules can be applied in a exhaustive fashion for some practical number of steps 3 
to produce a virtual library of compounds. Such a library could be subjected to 
in silico tests, to increase the efficiency with which chemists find new and useful 
compounds. 

The commercial SMILES based LUCIA package from Sertanty Inc. promises 
an extremely general approach to virtual library creation, but the authors did 
not have access to this package at the time of writing. 

4.3 Chemical Networks 

Chemists typically perform reactions step by step with as few components as 
possible, generally to reduce cost and the possibility of any side products that 

3 Thermodynamics prevents any reaction from completely converting one molecule to 
another. Even as we compose ^99% yield reactions together, the overall yield drops 
quickly as a function of the number of steps. The number of allowable steps in a 
large scale commercial synthesis varies with the value of the product, but a useful 
number to consider is ten. 
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may interfere in purifying the desired molecule. Pure substances are important 
for ease in characterization and reproducibility 4 . In general, it is difficult to 
predict all the possible outcomes when many different reactive species are put 
together in the same “soup” . 

One the fascinating aspects of biology is that many different and reactive 
small molecules productively co-exist in the same chemical soup, inside the liv- 
ing cell. Some suggest that it is the resulting complex network of interactions 
that distinguishes animate biological chemistry from everything else [18]; indeed, 
recent studies have espoused a “network-centric” [19-21] viewpoint of biology. 




Fig. 12. A Diels- Alder chemical network[22] from Benko et al. Boxed numbers repre- 
sent the rate of the reaction between reactants (pointing in) to the product (pointed 
to). 



A chemical graph transformation system would offer an algorithmic approach 
for enumerating all potential outcomes of “chemical networks” , when a large 
number of potential reactions are applied iteratively to a set of starting mate- 
rials. An interesting example [22] was reported where the Diels- Alder reaction 
(transformation c) in Figure 2) was iteratively applied for three cycles to an 

4 Consumers would be (rightfully) wary of purchasing medicine that was advertised 
as “80% pure”. 
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initial set of five molecules. The resulting set of products is shown in Figure 12, 
taken from the referenced paper. Generalizing these systems to arbitrary reac- 
tivity would provide fertile ground for inferring outcomes of complicated reactive 
systems, and stochastic application could even be used for physical simulation. 

5 Conclusion 

We hope that this paper will act as a springboard for investigators to explore 
and apply techniques of graph transformation to chemistry. We restricted our 
suggestions to those related to reactivity, but there are obvious applications to 
the more visual aspects of chemistry, such as molecular modeling. An efficient 
implementation of a graph rewriting system would provide chemical investiga- 
tors with an ability to more robustly determine retrosyntlretic strategies, design 
synthetically accessible virtual libraries and simulate complicated chemical soups 
of reactive molecules. An even greater hope is that such physical analogies will 
provide reciprocal insight to the computer science of graph trasformation. 
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Abstract. As graph transformation systems evolve, they are used to solve com- 
plex problems and the resulting specifications tend to get larger. With specifica- 
tions having more than one hundred pages they suffer from the same problems as 
large applications written in programming languages like C++ or Java do. Under 
the term programming in the large many different concepts have been developed 
to aid the solution of these problems. However, most graph transformation sys- 
tems lack support for the problems. 

With the introduction of UML-like packages, the PROGRES environment pro- 
vides basic support for specifying in the large. Now that packages have been used 
in many different specifications, their shortcomings have become obvious. Ad- 
ditionally, our experience with large specifications showed that packages alone 
are not sufficient and therefore we have developed concepts for modularizing 
and coupling specifications. Our graph database provides the runtime-support re- 
quired by these concepts. 



1 Introduction 

The idea of graph grammars arose in the late 1960s. With the help of graph gram- 
mars arbitrarily complex structures and relations can be represented [1]. Our depart- 
ment has been doing research in this area for many years. In 1989 we have started the 
development of the PROGRES language. PROGRES (PROgrammed Graph REwriting 
Systems) is a very high level programming language and an operational specification 
language for rapid prototyping [2]. The language is used to define, create, and manip- 
ulate graphs which are directed, typed, and attributed. The internal structure and the 
dynamic transformations of graphs are modeled in an intuitive way, for example the 
modification of complex structures can be specified visually. 

To handle this voluminous language, a special environment emerged, which offers a 
comfortable way to make use of all the features. The PROGRES environment provides 
three different tools: First of all, the environment includes a syntax-controlled editor 
with an integrated analyzer, which annotates all violations against the static semantics 
of the language. Furthermore, an interpreter with a corresponding graph browser aids 
the user in debugging the specification by supporting the incremental execution of a 
specification and presenting the user the resulting graph structure in a graphical view. 
Finally, a compiler translates the specification into adequate and efficient C-code [3]. 
With this sophisticated and stable environment a user can easily specify complex graph 
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transformation systems in a both textual and visual form. The graph schema and the ac- 
tual graph are saved in the non-standard database GRAS (GRAph Storage). GRAS [4] 
has also been developed at our department and is a special database management sys- 
tem for storing graphs. As mentioned earlier, the PROGRES environment translates the 
specifications into efficient C-code, which can be loaded into the UPGRADE frame- 
work (Universal Platform for GRAph-based DEvelopment). UPGRADE[5] automat- 
ically creates a prototype from the given C-Code, which offers the user an adequate 
graphical interface. The layout of the generated prototype can be adapted to the user’s 
needs. With the combination of PROGRES and UPGRADE several different tools have 
been developed, for example AHEAD [6], CHASID[7], and E-CARES[8], 

As we have been specifying tools and experimenting with graph grammars for sev- 
eral years, the specifications have become more and more complex and huge. PRO- 
GRES provides a lot of useful concepts, but some specifications exceed the capabilities 
of the current PROGRES language. Additionally, the users need new features to struc- 
ture large specifications in a clear manner. For example, decomposing a large specifica- 
tion into smaller parts is desirable. So, the user can work well-structured and can reuse 
parts of a certain specification in another one. To solve this problem, PROGRES has 
been extended with a package concept in [9]. As this concept has been used by some 
specificators, the shortcomings have become obvious. Furthermore, some users want 
to distribute the complex functionality of a tool over several prototypes which interact 
with each other. But the coupling of multiple prototypes is currently not supported by 
PROGRES. So we need new concepts in PROGRES for specifying in the large. 

In this paper we will analyze the existing concepts in PROGRES for specifying in 
the large and revise them. Additionally, we present new concepts that will support this 
task. To demonstrate our approach, we will introduce AHEAD as a complex example 
implemented at our department in Section 2. In Section 3 we will discuss the problems 
of large specifications in PROGRES using the AHEAD example. Afterwards, we will 
point out approaches for the distribution of graph transformations that can be realized by 
extending PROGRES. In Section 5 we will present the Gras/GXL database management 
system and its role within the PROGRES system to support specifying in the large. At 
the end of this paper, we will give an overview of related work and summarize our 
results. 



2 AHEAD: An Example for a Large PROGRES Specification 

To illustrate the needs for specifying in the large, we will introduce a tool called AHEAD 
(Adaptable and Human-centered Environment for the mAnagement of Development 
Processes). As the development of products in disciplines such as mechanical, chem- 
ical, or software engineering is very complex and dynamic, AHEAD has been devel- 
oped at our department. AHEAD constitutes a management system which supports the 
coordination of engineers through integrated management of products, activities, and 
resources. Therefore, AHEAD is composed of three combined partial models - CoMa, 
DYNAMITE, and RESMOD - which are all specified in PROGRES. Figure 1 shows a 
sample screenshot of AHEAD where the modeling of activities is demonstrated. Before 
we present these components in detail, we will explain some basic terms. A process 
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Fig. 1 . Screenshot of the AHEAD system. 



is an ordered set of working steps to achieve a certain result. The working steps are 
called tasks. The goal of a process consists in the creation of a certain product. In the 
case of software engineering the desired product is an extensive document. To achieve 
this objective, tasks consume input documents and produce output documents, which 
serve again as input for other tasks, like requirements definitions and design documents. 
For the execution of a task, resources are needed. We distinguish between human and 
non-human - for example computers - resources. 

When developing for example a complex software system, configuration manage- 
ment is needed, which controls the evolution of documents during the long and dynamic 
process. The documents describe the results of activities, such as requirements defini- 
tion, software architectures, and module implementations. Therefore CoMa (Configu- 
ration MAnagement)[10] has been developed which is used for the administration of 
these documents. The documents can be created within different tools. A configuration 
is composed of the documents and their dependencies. The documents and the config- 
urations are subject to version control. Within CoMa composition hierarchies can be 
modeled, which determine the components a (sub-)system consists of. Consequently, 
the tool supports the engineers in maintaining system consistency. The model of CoMa 
is generic and can be adapted to a specific process. 

To model dynamic development process, AHEAD uses DYNAMITE (DYNAMIC 
Task nEts)[l 1], DYNAMITE is based on dynamic task nets and provides the seamless 
interleaving of planning, editing, analyzing, and execution of task nets. A task net con- 
sists of tasks which are connected through different relations. The execution sequence 
of tasks is determined by control flows. Data flows connect the input and output of tasks. 
Feedback flows present feedbacks in the development process, for example when a task 
has revealed to be incorrect. DYNAMITE offers also the possibility to define complex 
tasks which can consist of several other tasks. A task is composed of an interface and 
a realization. The interface describes the purpose and therefore defines all necessary 
information for the environment, for example inputs, outputs, pre- and postconditions. 
On the other hand, the realization determines how this task is being executed. 
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RESMOD (RESource management MODel)[12] manages the resources which are 
needed for the development process. All available resources are listed in RESMOD. 
Human and non-human resources are modeled in a uniform way. They have to be de- 
clared with their name, specific attributes, and their required resources. For example, a 
certain software system requires for the execution a computer which fulfills its require- 
ments. A resource configuration is composed of a set of resources; one resource can 
be contained in several configurations. Additionally, RESMOD distinguishes between 
plan and actual resources. Plan resources support the manager in planning the resources 
for a project, for example he needs two programmers and three computers. In contrast 
to that, actual resources are concrete resources that can be mapped to the plan resources 
- like the two programmers Mrs. Smith and Mr. Scott. Furthermore, the manager can 
acquire special project resources which are only used in a certain project and are not 
employees in the company. 

With CoMa, DYNAMITE, and RESMOD, AHEAD supports managers and engi- 
neers in the development process of a complex system. AHEAD can be used in various 
domains, because the models of the three parts are generic and can be easily adapted. 
Additionally, through the combinations of the three components all complex relations 
within a complicated development process can be described, which aids maintaining 
system consistency. 



3 PROGRES and Specifying in the Large 

As we have seen in Section 2, AHEAD is a complex management system which of- 
fers an extensive support for managers and engineers during the development process. 
AHEAD is modeled in PROGRES and because of all its features and the part models 
the resulting specification is very large and complicated. In the following we present 
the current state of PROGRES regarding the handling of huge specifications, which 
includes packages and modularization. 

3.1 Package Structure 

In [9] PROGRES was extended by a package structure, which is based on the package 
concept of the UML (Unified Modeling Language) [13]. The package structure can be 
used to define arbitrary nested packages within a specification. These packages allow 
the specification of abstract data types or graph classes such as binary trees. 

A PROGRES package defines a name space for all the contained declarations, for 
example for the node classes and graph transformations. The declarations can be at- 
tributed with visibilities, like private, public, and protected similar to the regarding se- 
mantics of UML[14]. With the help of these attributes, the interface of a package is 
determined and so the access between two related packages is regulated. A package can 
import other packages, which enables the access on all public elements of the referred 
package. Additionally, specifiers may define inheritance hierarchies so that the inherit- 
ing package has access to the public and protected components of the superior package 
and can overwrite them. Packages may also be nested to allow successive refinement 
and structuring. 
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During the implementation of the package structure, it emerged that data abstraction 
contradicts visual programming. When the graph schema of a package is strictly hid- 
den, the graph transformations of the importing package can not be specified visually by 
using the schema of the other package. Transformations of the importing package can 
only invoke methods in a textual way. On the other hand, when the graph schema is laid 
open, the importing package can change the graph schema managed by the imported 
package. This might lead to inconsistencies, for example if attributes are overwritten 
or modified. To solve the so-called graph rewriting dilemma[\5 ], specifiers can utilize 
visibility attributes to restrict the access to the graph schema. In addition, declaration 
markings have been introduced which differ between read-only and read/write graph 
schema elements. Furthermore, three different types of integrity constraints can be used 
to guarantee consistency: constraints for pre- and post-conditions of graph transforma- 
tions, schema constraints, and global constraints for general graph consistencies. If an 
integrity constraint has been violated, the PROGRES runtime environment ensures that 
a so-called repair action is executed. The repair action is part of the respecting con- 
straint definition. 

As packages have been used in many specifications, two more problems have be- 
come obvious. First, transitive inheritance of packages is not possible. For example, 
when package B inherits from package A and package C inherits from package B, then 
the schema and the transformations from A are not known in C. The problem is not a 
conceptual, but rather a technical one. Second, the schema view of PROGRES has to 
be adapted to the package structure. Even though these are minor problems - which do 
not limit the usability of packages - we will try to solve them. 

3.2 Modularization 

Although PROGRES provides a package structure, the PROGRES environment is not 
able to import packages from other specifications. Again, this is a technical limitation. 
Only packages within the same specification can be extended or imported. This limita- 
tion makes it on the one hand impossible to split large specifications into several parts. 
For example, the AHEAD meta-model presented in Section 2 consists of three parts for 
the tasks, resources, and products of a development process. Therefore, the separation 
of these parts into different specifications is desirable and leads to a clear structure and 
comfortable handling of the huge AHEAD specification. On the other hand, specifiers 
want to reuse parts of specifications, for example a model of a binary tree from a pre- 
vious project. Thus, one of our goals is to eliminate this limitation by supporting the 
import of packages from other specifications. 

To solve the mentioned problems, modularization has to be introduced into the 
PROGRES language. As a result, we can not only manage large specifications easier, 
but also structure complex problems better by distributing them over several graphs. 
Together with this enhancement we will have to provide support for graph-boundary 
crossing relations and graph hierarchies. While the support for boundary crossing re- 
lations can be partially hidden in the specification - the import of a specification au- 
tomatically creates such a relation - we will have to provide new language elements 
for supporting graph hierarchies. However, we are still investigating if and how these 
concepts can be integrated into PROGRES using the existing package structure. 
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As this chapter has shown, specifiers have to be supported by means of packages 
and modularization. With inheritance, nesting, and the definition of graph classes, pack- 
ages provide an adequate mechanism for handling large specifications easier. While this 
technique enables a clear structure of one specification, modularization offers the split- 
ting of a specification into several parts. In this way individual components can be 
reused in other specifications. 



4 Introducing Support for Distributed Specifications 

An extension! 16] of the AHEAD system supports distributed development processes. 
Two or more organizations use their own AHEAD instance, which are coupled at run- 
time. The basic idea of the extension is to propagate modifications of a task net to other 
AHEAD instances by events. This approach suffers from the problem that PROGRES 
does not support events on its own. However, the GRAS database system utilized by 
PROGRES prototypes uses events to indicate graph manipulations. Hence, the commu- 
nication between the different AHEAD instances is realized by simply creating nodes 
representing events within the PROGRES specification. Then, GRAS creates appro- 
priate events, which are transmitted to the other instances using the hand-coded com- 
munication server of the AHEAD prototype. In addition, the communication server is 
responsible for storing events, if an AHEAD instance is not online and thus can not 
receive events. A remote link manager, which is present in every prototype, receives 
events from the GRAS database, transmits them to the communication server, and in- 
vokes the appropriate graph transformation for an event vice versa. The transformation 
RI_ReleaseOutput, which releases a document after it has been consumed by a 
task, is an example for this approach: 

transaction RI_ReleaseOutput ( Token : SEM_TOKEN ; 

df Target : SEM_TASK ; SessionID : SESSION) = 
use eventNode : EVENT 
do 

lA_DF_ReleaseOutput ( Token, dfTarget, SessionID ) 

& choose 

REM_MarkAsTransf ered ( Token ) 

& EVT_RaiseRelationEvent ( Token, dfTarget, EventTransf er , 
out eventNode ) 

else 

EVT_RaiseRelationEvent ( Token, dfTarget, 
EventReleaseOutput, out eventNode ) 

end 

end 

end; 



Because the whole coupling logic remains in the specification, we generate the re- 
mote link manager automatically from the specification. A naming convention deter- 
mines which transformation has to be executed in response to a certain event. The im- 
plementation of the communication server is fairly generic and may be reused by other 
prototypes which apply the same coupling mechanism. However, up to now AHEAD is 
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the only prototype using this mechanism. Due to the naming conventions the applica- 
bility of the concept in other specifications has to be analyzed. 

The aforementioned approach has some drawbacks: Events have to be created and 
orphaned events must be removed, external programs have to be written to implement 
the communication, and implementation details clutter the specification. Obviously, 
adding appropriate language elements to PROGRES will solve these problems. Our first 
step is to propose language constructs for defining, raising, and intercepting events. 

Before events can be raised, we have to define them. The following construct is 
used to specify an event class, which can have a super class. Each event class may 
define attributes which contain information that is transmitted together with the event. 

EventClassDecl ::= "event" DeclEventld [ "is_a" ApplEventID ] 

[ OptAttDeclList ] 

"end" 



The attribute definition not only allows intrinsic attributes, but also derived and 
meta attributes. Attributes can be redefined in sub classes. The ability to use derived 
attributes is crucial for the coupling of different prototype instances. Certain attributes 
- especially node identifiers - do not make much sense in other instances. With derived 
attributes meaningful values can be computed for these attributes. For example, a node 
representing a task is not present in the other instances and thus has no meaning to them. 
But, the task’s name - which can be computed by a derived attribute - can be used in the 
communication between the instances. After the definition, our specification can raise 
these events with the following statement: 

EventRaiseStat ::= "raise" [CouplingMode] 

ApplEventID OptActParList 

CouplingMode ::= "immediate" | "deferred" | "decoupled" 

If intrinsic attributes are defined for the event, their concrete values must be set in the 
parameter list in the order of declaration. The derived attributes are evaluated directly 
after the values of the intrinsic attributes have been set. PROGRES executes each graph 
transformation in its own transaction. In the specification we can define how the execu- 
tion of an event is coupled to the transaction. Hence, together with the raise statement 
one of three coupling modes can be defined, immediate raises the event immediately 
after the execution of the statement. The coupling mode decoupled raises the event 
only after the transaction has committed its changes successfully. Between these two 
modes is the default mode deferred, which raises the event just before the transac- 
tion is about to commit. 

As a complementary operation, we have a construct to intercept events and perform 
a series of actions. The attributes transmitted with the event can be used as if they were 
local variables. 

OnEventDecl ::= "on_event" ApplEventID "=" StatExpr "end" 

With these basic language constructs, the specificator can leave the event handling 
to the PROGRES runtime environment. Before, he had to manage the event handling 
on his own as shown in [16]. With the help of the event mechanism the graph transfor- 
mation RI_ReleaseOutput is now realized like this: 
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transaction RI_ReleaseOutput ( Token : SEM_TOKEN ; 

df Target : SEM_TASK ; SessionID : SESSION) = 
use eventNode : EVENT 
do 

lA_DF_ReleaseOutput ( Token, dfTarget, SessionID ) 
& choose 

REM_MarkAsTransf ered ( Token ) 

& raise EventTransf er (Token, dfTarget) 
else 

raise EventReleaseOutput (Token, dfTarget) 
end 
end 
end; 



Compared to the earlier version of the transformation, only little has changed. How- 
ever, behind the scene we have far reaching changes. The specification gets simpler 
because the event handling is realized within the PROGRES environment and not the 
specification. Besides that, we can now explicitly define the actions which are executed 
when an event is intercepted. In the former specification there was a one-to-one match 
between the event’s node type and the name of the transaction, which is executed for 
this event. Finally, the realization of the prototype gets easier, because we no longer 
need the implementation of the communication server and the remote link manager, 
which consist of 3000 lines of Java code and 1000 lines of XSL transformation scripts. 
All these components are now part of the PROGRES runtime environment. 

Some of the problems described earlier can not be solved with this mechanism. 
Still, parts of the specification have to be modified. An adequate solution lies in rais- 
ing events when a specific transaction is executed or certain graph modifications are 
made. Investigation of the AHEAD specification for distributed development processes 
showed that it is not that simple: For example, events may only be raised in very specific 
situations, thus conditions are necessary. Additionally, sometimes further transforma- 
tions have to be executed before the event can be raised, as seen in the transformation 
RI_ReleaseOutput. There the transformation REM_MarkAsTransf ered has to 
be executed to indicate that the token has been transfered. Another interesting extension 
to the event mechanism would be to raise events when a certain graph pattern has been 
created. Supporting this extension would be quite hard, because the graph pattern can 
be created by a series of transformations. 

Of course, the new event concept for PROGRES is not limited to couple different 
specifications. The concept could also be used to realize reactive specifications which 
are executed in one prototype. We will first realize the aforementioned constructs and 
replace the event mechanism in the AHEAD specification. Afterwards, we will inves- 
tigate which extensions should be realized to ease the specification of distributed and 
reactive systems. As long as PROGRES is not able to import other specifications, event 
definitions can not be shared and they have to be replicated in all participating specifi- 
cations. 

Even though PROGRES specifications can be coupled with the introduction of 
events, the specificator still has to worry about the “completeness” of the coupling. 
For example, he does not know if events are raised at all places in the specification 
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Fig. 2. System structure of the Gras/GXL database management system. 



where they should. The problem is that the PROGRES environment can not provide 
any support for this. Therefore, we are investigating a second approach: the distribu- 
tion of specifications can also be achieved by having a PROGRES specification which 
imports two other specifications and defines correspondences between them. Based on 
the correspondences, the coupling could be generated by the PROGRES environment 
automatically. But, we believe that this is not sufficient and the incorporation of other 
concepts - like the event mechanism described earlier - is still necessary. At the mo- 
ment, we are studying approaches like triple graph grammars[17] and integrators! 18, 
19] to figure out how they fit into this vision. 



5 Gras/GXL: A DBMS for Distributed Graph-Based Applications 

Integrated development environments and visual language editors often use graphs or 
graph-like structures as data structures for their documents. Storing these graphs in a 
graph-oriented database management system has many advantages compared to storing 
them in main memory - data integrity, virtually unlimited graph size, etc. Thus, our 
department started the development on the graph-oriented database system GRAS in 
1984. At that time, commercial database management systems did not provide features 
like incremental attribute evaluation or undo / redo of graph modifications which are 
especially used by PROGRES. 

In the last few years PROGRES specifications and the graphs created by them got 
so large that certain limitations of GRAS hinder further development of both. At this 
time, the development of GRAS’ successor Gras/GXL[20] has started. Unlike GRAS, 
Gras/GXL does not rely on its own storage management or transaction handling. Instead 
Gras/GXL uses third-party components whenever possible: commercial databases are 
used for storing graphs and transactional consistency is ensured by CORBA-compliant 
transaction managers. 
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Figure 2 illustrates the system structure of the Gras/GXL database management 
system. The Gras/GXL kernel defines the interfaces for the generic graph model and 
the graph schema. Storage modules tailored towards specific databases implement these 
two interfaces to store the graphs. Because of these storage modules Gras/GXL does 
not only support object-oriented and relational databases, the graphs can also be stored 
in main memory. At runtime these modules are plugged into the kernel. In addition, an 
abstraction layer to third-party components like transaction managers and event services 
is provided, to decouple the kernel from specific service implementations. 

On top of the Gras/GXL kernel extensions can be implemented which provide ser- 
vices not available in the kernel. Examples for extensions are incremental evaluation of 
attributes, versioning, references to graph elements, etc. Extensions may be combined 
freely. But, some extensions may depend on others to realize their functionality, for 
example the undo/redo extension depends on the graph versioning extension. Another 
restriction for extensions is that they implement at least the interfaces of the generic 
graph model and the graph schema. The Gras/GXL graph model is richer than the ones 
used by most graph-based applications - for example n-ary relations are not supported 
by PROGRES. Thus, most applications will implement their own specific graph model 
on top of the Gras/GXL extensions to hide concepts they do not use. At the moment the 
specific graph model and its mapping to the Gras/GXL graph model have to be imple- 
mented manually, as we have already done for the PROGRES graph model. Currently, 
we are investigating how a specific graph model and its mapping can be generated from 
UML class diagrams. 

5.1 Gras/GXL: Graph Model and Graph Schema 

The UML class diagram for the Gras/GXL graph model is shown in Figure 3. A graph 
pool stores an arbitrary number of graphs which are identified by their roles. Graphs 
are a special kind of graph element - like nodes, edges, and relations. Each graph 
contains an arbitrary number of graph elements and other graphs. Graph elements can 
have an arbitrary number of attributes and meta-attributes. Because meta-attributes are 
not presented to the user, they are commonly used to store management information 
required by an extension, for example the cardinality of attributes for the PROGRES 
graph model. Relations and edges - which are just a shortcut for binary relations - 
connect graph elements stored in possibly different graph pools. As explained before, 
graphs are just ordinary graph elements in our graph model. Thus, they can be visited by 
edges and relations directly without using special graph elements. Edges and relations 
can be ordered. Hierarchical graphs are created either by a containment relationship or 
by graph-valued attributes. The containment relationship is used if a graph should be 
contained in another graph. Graph-valued attributes are used in all other situations, for 
example if a node should contain a graph. The use of graph-valued attributes together 
with the containment relationship allows us to create arbitrary hierarchical graphs and 
handle even complex situations uniformly - like hierarchies of graphs stored in different 
databases. As a result we get a clean and efficient realization of graph hierarchies. 

The first version of the Gras/GXL graph model we presented in [20] supported ref- 
erences to graph elements. We have now dropped this support from the graph model, 
because it has a significant performance loss even if references are not used. If an ap- 
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Fig. 3. Gras/GXL graph model. 



plication requires references, they can be implemented on top of the graph model using 
our extension mechanisms. Thus, other applications no longer suffer from the inevitable 
performance loss. 

The Gras/GXL graph model requires that every graph element has a type. Thus 
for each class in Figure 3 a corresponding graph schema class exists. Interesting in 
this paper are only two classes: GraphEntityClass and GraphElementClass. 
GraphEntityClass introduces attributes and meta-attributes and provides methods 
to declare, undeclare and enumerate them. GraphElementClass extends Graph- 
EntityClass and adds multiple inheritance and support of abstract classes. For all 
concrete graph elements a companion class exists, for example for a Node a Node - 
Class. Gras/GXL provides no means to support the structuring of a graph schema. 
Basically, two reasons speak against. First, a sophisticated structuring concept is likely 
to restrict the realization of a specific graph model. And second, basic support for struc- 
turing a schema would result in merely managing lists of classes, which clutters the 
interface and database support is unlikely to result in a performance gain. Thus, con- 
cepts for structuring a graph schema have to be implemented within the application 
graph model. 

5.2 Gras/GXL: Support for Specifying in the Large 

The graph-oriented database GRAS reduces the complexity of the PROGRES runtime 
environment, because some problems can be solved more easily. For example, static 
paths - which are essential for most specifications - are implemented using incre- 
mentally evaluated attributes; backtracking is realized by using the undo- and redo- 
operations of GRAS. The solutions developed and suggested in earlier sections demand 
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for database support to ease their realization in the PROGRES runtime environment. 
Most of these required features are already present in Gras/GXL. 

Generic Event Service: Events allow prototypes to exchange information about their 
state and modifications. The event service of Gras/GXL supports communication with 
arbitrary events and is not limited to the propagation of graph modifications. Events 
are typed and may carry any number of attributes. The event service compares any 
event against a set of rules which consists of an event pattern and an action. If the 
event matches an event pattern, the corresponding action is executed. Unlike its prede- 
cessor GRAS, Gras/GXL is aware of the transactional context and provides different 
modes (immediate, deferred, and decoupled) to couple the execution of the action to 
the transaction. Because an execution condition can be checked easily within an action, 
Gras/GXL only supports EA (event action) rules, and not ECA (event condition action) 
rules as introduced in [21 ]. 

Identification and Lookup of Prototypes: Gras/GXL provides a unique identifier for 
each graph and schema entity. The identifier consists not only of a simple number, but 
also contains the complete identification for the database which stores the entity by a so- 
called data source URL. Thus, graph elements stored in other graphs can be identified 
and accessed from other prototypes. 

Structuring Complex Graphs: Gras/GXL provides several means to create graph 
hierarchies and to structure complex graphs, as explained earlier. If the graphs get too 
large, they can be distributed over several databases. 

Graph-boundary Crossing Relations and References: Often, graphs are used to rep- 
resent documents. Dependencies within and between them are expressed by edges or 
relations. When the different documents are distributed on many graphs and databases, 
boundary crossing relations are necessary to maintain these dependencies. In addition, 
these relations are used to couple prototypes. Fortunately, Gras/GXL supports bound- 
ary crossing edges. If one prototype has to access information managed by the other, 
for example to ensure document consistencies, distributed transactions come into play. 
Currently, we are investigating how the database can support this and how distributed 
transactions can be integrated into the PROGRES language. Instead of boundary cross- 
ing relations, references to graph elements can be used. However, the support of refer- 
ences is only supported through extensions. 

Until now, the development for specifying in the large has just started and only a 
few requirements have been identified. The Gras/GXL database management system 
supports these requirements through its generic graph model, graph schema, and exten- 
sibility. 



6 Related Work 

In Section 3 an 4 we showed the problems caused by large specifications. These prob- 
lems are not unique to the field of graph transformation systems. Modeling languages 
like the Unified Modeling Language [ 13] or programming languages like Ada[22] offer 
a couple of concepts to aid the solution of such problems. However, besides PROGRES 
only few graph transformation systems developed concepts for structuring huge speci- 
fications. 
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6.1 Graph Transformation Systems 

With story diagrams Fujaba[23] introduced a new notation for graph rewriting rules, 
which are a combination of UML activity and collaboration diagrams. Fujaba is tar- 
geted towards the Java programming language and thus utilizes Java packages to struc- 
ture specifications. However, Java packages do not support package extension by in- 
heritance or similar concepts. PROGRES packages are used to model the similarity 
of graph classes. In contrast, Java packages can only structure a software system - or 
specification in the case of Fujaba - but not more. A plug-in developed at the Technical 
University of Darmstadt aims to bring MOF 2.0[24] support to Fujaba. At the moment, 
Fujaba is based on UML 1 . Full support for packages - like package merge or inheri- 
tance - seems to be out of the scope of the plug-in. We intend to fill the resulting gap 
with the package concepts developed for PROGRES and implement a Fujaba plug-in 
for this. 

Another example of a graph transformation system is GRACE (GRAph and rule 
CEntered specification language) [25]. Various researchers from different German uni- 
versities work on the development of GRACE. GRACE is approach independent due 
to the consideration of graph transformations as a uniform framework. Many differ- 
ent and competing graph transformation approaches are available in the literature[l], 
so the specifier can choose the type of graphs, rule, and rule applications according to 
his taste. The main syntactic entities in GRACE are transformation units, which are 
composed of rules, an initial and a terminal graph class expression, a control condition, 
and import components. With their help, binary relations on graphs can be modeled. 
To handle large specifications, the transformation units can be structured into modules, 
which can again be imported by other modules. Additionally, a concurrent semantics of 
transformation units is defined in GRACE which provides the simultaneous execution 
of imported transformation units and rules to a graph. Therefore, the graph is divided 
into several graphs; a boundary graph contains all overlapping parts. The boundary 
graph can only be read and must not be changed. Furthermore, distributed graph trans- 
formations[26\ can be realized through distributed transformation units, which consist 
of local transformation units that are connected by interface units. In [27] a different 
approach for the realization of hierarchically distributed graph transformation is pre- 
sented. Within a network graph, the so-called network nodes constitute local systems 
which are again modeled as graphs. They are connected through relations in the net- 
work graph which can specify consistency constraints. All transformations within the 
distributed system are described with hierarchical distributed graph (HD-graph) pro- 
ductions. The productions consist of a network production and, if also the local systems 
should be transformed, local productions. In future, we will analyze the mentioned con- 
cepts and investigate if they can be adapted to PROGRES. 

Other examples for graph transformation systems are AGG[28] and DiaGen[29]. 
Until now, both systems do not provide any modularization concepts. 

6.2 Unified Modeling Language 2.0 

The Unified Modeling Language is a general purpose language for modeling artifacts 
of systems. As these models can get very large, UML provides packages to make these 
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artifacts manageable. Most UML elements can be contained in a package and can be 
accessed by its full qualified name. The visibility of an element limits its accessibility 
from other packages. For the sake of simplicity we will ignore the effect of an elements’ 
visibility in the following description of UML packages. 

Elements from one package can be imported into another. The import relationship 
makes the element directly visible in the importing package and the element can be 
accessed by its full qualified name. Another relationship between packages is the pack- 
age merge. The UML 2.0 Infrastructure [30] distinguishes between two different kinds 
of package merge semantics: define and extend. For extend, elements from the merged 
packages with the same kind and name are merged into a single element using special- 
ization and redefinition, respectively. Extend can be seen as a short-hand for explicitly 
defining the appropriate specializations and generalizations. In contrast, for a package 
merge of kind define the contents of the merged package are copied into the merging 
package using deep copy, where applicable. 

The PROGRES packages, their relation to UML packages and other modularization 
concepts, are discussed in [14]. Currently, we are investigating the influences of the 
recent developments of UML 2.0 on PROGRES packages. 

7 Conclusion 

Although PROGRES is a very complex environment which has been extended in sev- 
eral dissertations, PROGRES lacks additional extensions like modularization. As the 
demonstration of the AHEAD specification showed, PROGRES demands for a com- 
fortable handling of large specifications. This includes on the one hand an elaborate 
package structure which allows the modeling of graph classes. On the other hand, PRO- 
GRES lacks a modularization concept, so that parts of a specification can be reused in 
other ones. Furthermore, distributed graph transformations are necessary, which can be 
realized through an event mechanism. Therefore, we have introduced new language ele- 
ments for defining and handling events. Gras/GXL provides the foundation for realizing 
the specification in the large and distribution concepts within the PROGRES runtime 
environment. 

We will analyze the mentioned approaches and investigate which concepts can be 
adapted to PROGRES. As PROGRES is a very large and complicated environment, 
the planned extensions to this environment must be well considered and coherent to all 
other implemented concepts. After the realization of the new concepts, their influence 
on specification styles has to be examined. As several researchers at our department 
use PROGRES to specify tools from various domains, the applicability can be locally 
tested. Afterwards, we will introduce adequate mechanisms to FUIABA, because FU- 
JABA lacks support for distributed graph transformations. 

References 

1 . Rozenberg, G., et al., eds.: Handbook on Graph Grammars and Computing by Graph Trans- 
formation. Volume 1-3. World Scientific, Singapore (1997) 

2. Schiirr, A.: Operationales Spezifizieren mit programmierten Graphersetzungssystemen. Dis- 
sertation, RWTH Aachen (1991) 



110 



Boris Bohlen and Ulrike Ranger 



3. Ziindorf, A.: Eine Entwicklungsumgebung fur PROgrammierte GRaphErsetzungsSysteme. 
Dissertation, RWTH Aachen (1995) 

4. Kiesel, N., Schiirr, A., Westfechtel, B.: GRAS, a graph-oriented (software) engineering 
database system. Information Systems 20 (1995) 21-51 

5. Bohlen, B., Jager, D., Schleicher, A., Westfechtel, B.: UPGRADE: A framework for build- 
ing graph-based interactive tools. In Mens, T., Schiirr, A., Taentzer, G., eds.: Graph-Based 
Tools (GraBaTs 2002). Volume 72 of Electronical Notes in Theoretical Computer Science., 
Barcelona, Spain, Elsevier Science Publishers (2002) 

6. Jager, D., Schleicher, A., Westfechtel, B.: AHEAD: A graph-based system for modeling and 
managing development processes. [31] 325-339 

7. Gatzemeier, F.: CHASID - A Semantic-Oriented Authoring Environment. Dissertation, 
RWTH Aachen (to appear in 2004) 

8. Marburger, A., Westfechtel, B.: Graph-based reengineering of telecommunication systems. 
[32] 270-285 

9. Schiirr, A., Winter, A.J.: Uml packages for programmed graph rewriting systems. [33] 
396-409 

10. Westfechtel, B.: Using programmed graph rewriting for the formal specification of a config- 
uration management system. [34] 164-179 

11. Heimann, P., Joeris, G., Krapp, C.A., Westfechtel, B.: DYNAMITE: Dynamic task nets 
for software process management. In: Proceedings of the 18 t ' 1 International Conference 
on Software Engineering (ICSE’96), Berlin, Germany, IEEE Computer Society Press, Los 
Alamitos, CA, USA (1996) 331-341 

12. Krapp, C.A., Kriippel, S., Schleicher, A., Westfechtel, B.: Graph-based models for managing 
development processes, resources, and products. [33] 455^174 

13. Rumbaugh, J., Jacobson, I., Booch, G.: The Unified Modelling Language Reference Manual. 
Object Technology Series. Addison Wesley, Reading, MA, USA (1999) 

14. Schiirr, A., Winter, A.J.: UML Packages for PROgrammed Graph REwriting Systems. [33] 

15. Winter, A.J.: Visuelles Programmieren mit Graphtransformationen. Dissertation, RWTH 
Aachen (2000) 

16. Heller, M., Jager, D.: Graph-based tools for distributed cooperation in dynamic development 
processes. [35] 352-368 

17. Schiirr, A.: Specification of graph translators with triple graph grammars. [34] 151-163 

18. Becker, S.M., Haase, T., Westfechtel, B.: Model-based a-posteriori integration of engineering 
tools for incremental development processes. Journal of Software and Systems Modeling 
(2004) to appear. 

19. Enders, B.E., Heverhagen, T.. Goedicke, M., Tropfner, P., Tracht, R.: Towards an integration 
of different specification methods by using the viewpoint framework. Transactions of the 
SDPS 6 (2002) 1-23 

20. Bohlen, B.: Specific graph models and their mappings to a common model. [35] 45-60 

21. Hsu, M.C., Ladin, R., McCarthy, D.: An execution model for active DB management sys- 
tems. In Beeri, C., Dayal, U., Schmidt, J.W., eds.: Proceedings of the 3 rd International 
Conference on Data and Knowledge Bases - Improving Usability and Responsiveness, 
Jerusalem, Israel, Morgan Kaufmann, San Francisco, CA, USA (1988) 171-179 

22. ISO/IEC 8652:1995: Annotated Ada Reference Manual. Intermetrics, Inc. (1995) 

23. Fischer, T., Niere, J., Torunski, L., Ziindorf, A.: Story diagrams: A new graph rewrite lan- 
guage based on the Unified Modelling Language and Java. [33] 

24. Object Managment Group, Needham, MA, USA: MetaObject Facility (MOF) Specification, 
Version 2.0. (2004). URL http://www.omg.org/uml 



Concepts for Specifying Complex Graph Transformation Systems ill 



25. Kreowski, H.J., Busatto, G., Kuske, S.: GRACE as a unifying approach to graph- 

transformation-based specification. In Ehrig, H., Ermel, C., Padberg, J., eds.: UNIGRA 
2001: Uniform Approaches to Graphical Process Specification Techniques. Volume 44 of 
Electronical Notes in Theoretical Computer Science., Genova, Italy, Elsevier Science Pub- 
lishers (2001) 

26. Rnirsch, P., Kuske, S.: Distributed graph transformation units. [32] 207-222 

27. Taentzer, G.: Hierarchically distributed graph transformation. In Cuny, J., Ehrig, H., En- 
gels, G., Rozenberg, G., eds.: Proceedings 5 th International Workshop on Graph Grammars 
and Their Application to Computer Science. Volume 1073 of Lecture Notes in Computer 
Science., Williamsburg, VA, USA, Springer- Verlag, Heidelberg (1995) 304-320 

28. Taentzer, G.: AGG: A tool enviroment for algebraic graph transformation. [31] 

29. Minas, M.: Bootstrapping visual components of the DiaGen specification tool with DiaGen. 
[35] 398-412 

30. Object Managment Group, Needham, MA, USA: UML 2.0 Infrastructure Specification. 
(2003). URL http://www. 0 mg. 0 rg//uml 

31. Nagl, M., Schiirr, A., Munch, M., eds.: Proceedings International Workshop on Applications 
of Graph Transformation with Industrial Relevance (AGTIVE’99). Volume 1779 of Lecture 
Notes in Computer Science. Kerkrade, The Netherlands, Springer- Verlag, Heidelberg (2000) 

32. Corradini, A., Ehrig, H., Kreowski, H.J., Rozenberg, G., eds.: Proceedings 1 st International 
Conference on Graph Transformation (ICGT ‘02). Volume 2505 of Lecture Notes in Com- 
puter Science. Barcelona, Spain, Springer- Verlag, Heidelberg (2002) 

33. Ehrig, H., Kreowski, G.E.H.J., Rozenberg, G., eds.: Proceedings 6 th International Workshop 
on Theory and Application of Graph Transformation (TAGT’98). Volume 1764 of Lecture 
Notes in Computer Science. Paderborn, Germany, Springer- Verlag, Heidelberg (1999) 

34. Mayr, E., Schmidt, G., Tinhofer, G., eds.: Proceedings WG ’94 20 th International Work- 
shop on Graph-Theoretic Concepts in Computer Science. Volume 903 of Lecture Notes in 
Computer Science. Herrsching, Germany, Springer- Verlag, Heidelberg (1995) 

35. Pfaltz, J.L., Nagl, M., Bohlen, B., eds.: Proceedings 2 nd International Workshop on Appli- 
cations of Graph Transformation with Industrial Relevance (AGTIVE’03). Volume 3062 of 
Lecture Notes in Computer Science. Charlottesville, VA, USA, Springer- Verlag, Heidelberg 
(2004) 



Typing of Graph Transformation Units* 



Renate Klempien-Hinrichs, Hans-Jorg Kreowski, and Sabine Kuske 



University of Bremen, Department of Computer Science 
P.O.Box 33 04 40, 28334 Bremen, Germany 
{rena.kreo ,kuske}@inf ormatik.uni-bremen. de 



Abstract. The concept of graph transformation units in its original 
sense is a structuring principle for graph transformation systems which 
allows the interleaving of rule applications with calls of imported units 
in a controlled way. The semantics of a graph transformation unit is a 
binary relation on an underlying type of graphs. In order to get a flex- 
ible typing mechanism for transformation units and a high degree of 
parallelism this paper introduces typed graph transformation units that 
transform ft-tuples of typed input graphs into /-tuples of typed output 
graphs in a controlled and structured way. The transformation of the 
typed graph tuples is performed with actions that apply graph transfor- 
mation rules and imported typed units simultaneously to the graphs of 
a tuple. The transformation process is controlled with control conditions 
and with graph tuple class expressions. The new concept of typed graph 
transformation units is illustrated with examples from the area of string 
parsing with finite automata. 



1 Introduction 

The area of graph transformation brings together the concepts of rules and 
graphs with various methods from the theory of formal languages and from the 
theory of concurrency, and with a spectrum of applications, see the three volumes 
of the Handbook of Graph Grammars and Computing by Graph Transforma- 
tion as an overview [15,5,7]. The key of rule-based graph transformation is the 
derivation of graphs from graphs by applications of rules. In this way, a set of 
rules specifies a binary relation of graphs with the first component as input and 
the second one as output. If graph names the class of graphs G, the type of such 
a specified relation is graph — > graph where each graph is a potential input and 
output. To get a more flexible typing, one can employ graph schemata or graph 
class expressions X that specify subclasses G{X) of the given class of graphs. 
This allows one typings of the form / — > T restricting the derivations to those 
that start in initial graphs from G(I) and end in terminal graphs from G(T). 
Alternatively, one may require that all graphs involved in derivations stem from 
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G(X) for some expression X (cf. the use of graph schemata in PROGRES [17]). 
Another form of typing in the area of graph transformation can be found in 
the notion of pair grammars and triple grammars where a pair resp. a triple of 
graphs is derived in parallel by applying rules in all components simultaneously 
(see, e.g., [14,16]). 

In this paper, we propose a new, more general typing concept for graph 
transformation that offers the parallel processing of arbitrary tuples of graphs. 
Moreover, some components can be selected as input components and others as 
output components such that relations of the type I\ x • • • x I}. — > T) x • • • x T) can 
be specified. The new typing concept is integrated into the structuring concept 
of graph transformation units (see, e.g., [1,11,12]). 

The concept of graph transformation units in its original sense is a structuring 
principle for graph transformation systems which allows the interleaving of rule 
applications with calls of imported units in a controlled way. The semantics of a 
graph transformation unit is a binary relation on an underlying type of graphs 
that transforms initial graphs into terminal ones. In order to get a flexible typing 
mechanism for transformation units and a high degree of parallelism this paper 
introduces typed graph transformation units that transform fc-tuples of typed 
input graphs into /-tuples of typed output graphs in a controlled and structured 
way. The transformation of the typed graph tuples is performed with actions 
that apply graph transformation rules and imported typed units simultaneously 
to the graphs of a tuple. The transformation process is controlled with control 
conditions and with graph tuple class expressions. The new concept of typed 
graph transformation units is illustrated with examples from the area of string 
parsing with finite automata. 

2 Typed Graph Transformation 

Graph transformation in general transforms graphs into graphs by applying 
rules, i.e. in every transformation step a single graph is transformed with a 
graph transformation rule. In typed graph transformation this operation is ex- 
tended to tuples of graphs. This means that in every transformation step a tuple 
of graphs is transformed with a tuple of rules. The graphs, the rules, and the 
ways the rules have to be applied are taken from a so-called base type which 
consists of a tuple of rule bases. A rule base is composed of graphs, rules, and a 
rule application operator. 

2.1 Rule Bases 

A rule base B = (G,TZ, ==>) consists of a type of graphs Q, a type of rules 7Z, 
and a rule application operator =>. In the following the components Q , TZ, and 
=> of a rule base B are also denoted by Gb, TZb , and =>b, respectively. 

Examples for graph types are labelled directed graphs, graphs with a struc- 
tured labelling (e.g. typed graphs in the sense of [3]), hypergraphs, trees, forests, 
finite automata, Petri nets, etc. The choice of graphs depends on the kind of 
applications one has in mind and is a matter of taste. 
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In this paper, we explicitly consider directed, edge-labelled graphs with indi- 
vidual, possibly multiple edges. A graph is a construct G = (V,E,s,t,l) where 
V is a set of vertices , E is a set of edges , s,t: E — > V are two mappings assigning 
to each edge e £ E a source s(e) and a target f(e), and l : E — > E is a mapping 
labelling each edge in a given label alphabet E. 

For instance the graph 

J)egin* a** a -, a . 

consists of seven nodes and six directed edges. It is a string graph which repre- 
sents the string aaba. The beginning of the string is indicated with the begin-e dge 
pointing to the source of the leftmost a-edge. Analogously, there is an end-edge 
originating from the end of the string, i.e. from the target of the rightmost a-edge. 

Another instance of a graph is the following deterministic finite state graph 
where the edges labelled with a and b represent transitions, and the sources and 
targets of the transitions represent states. The start state is indicated with a 
start-edge and every final state with a final-edge. Moreover there is an edge 
labelled with current pointing to the current state of the deterministic finite 
state graph. 




To be able to transform the graphs in Q, rules are applied to the graphs 
yielding graphs again. Hence, each rule r € 1Z defines a binary relation ==> C 

r 

Q x Q on graphs. If G = =>■ H, one says that G directly derives H by applying 

r 

r. There are many possibilities to choose rules and their applications. Types 
of rules may vary from the more restrictive ones, like edge replacement [4] or 
node replacement [8], to the more general ones, like double-pushout rules [2], 
single-pushout rules [6], or PROGRES rules [17]. 

In this paper, we concentrate on a simplified notion of double-pushout rules, 
i.e. every rule is a triple r = (L,K,R) where L and R are graphs (the left- 
and right-hand side of r, respectively) and K is a set of nodes shared by L 
and R. In a graphical representation of r, L and R are drawn as usual, with 
numbers uniquely identifying the nodes in K . Its application means to replace 
an occurrence of L with R such that the common part K is kept. In particular, 
we will use rules that add or delete a node together with an edge and/or that 
redirect an edge. 

A rule r = (L,K,R) can be applied to some graph G directly deriving the 
graph H if H can be constructed up to isomorphism (i.e. up to the renaming of 
nodes and edges) in the following way. 
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1. Find an isomorphic copy of L in G, i.e. a subgraph that coincides with L up 
to the naming of nodes and edges. 

2. Remove all nodes and edges of this copy except the nodes corresponding to 
AT, provided that the remainder is a graph (which holds if the removal of a 
node is accompanied by the removal of all its incident edges). 

3. Add R by merging K with its corresponding copy. 

For abbreviating sets of rules, we use also variables instead of concrete labels. 
For every instantiation of a variable with a label we get a rule as described above. 
For example, the following rule read(x) has as left-hand side a graph consisting 
of an x-edge and a begin-e dge. The right-hand side consists of the target of a 
new begin-edge pointing from the source of the old begin-e dge to the target of 
the x-edge. The common part of the rule read(x) consists of the source of the 
begin-edge and the target of the x-edge. 

beqin x beqin 

read(x): *- — -»-• ::= *- — 

1 2 12 

If the variable x is instantiated with a, the resulting rule read (a) can be 
applied to the above string graph. Its application deletes the begin-e dge and the 
leftmost a-edge together with its source. It adds a new begin-edge pointing from 
the source of the old begin-e dge to the target of the a-edge. The resulting string 
graph represents the string aba. 

The following rule go(x) redirects a current-labelled edge from the source of 
some x-labelled edge to the target of this edge. 

x x 



If x is instantiated with a, its application to the above deterministic finite 
state graph results in the same deterministic finite state graph except that the 
current state is changed to the start state. 

2.2 Graph Tuple Transformation 

As the iterated application of rules transforms graphs into graphs yielding an 
input-output relation, the natural type declaration of a graph transformation 
in a rule base B = (Q, TZ,=>) is B : Q — » Q. But in many applications one 
would like to have a typing that allows one to consider several inputs and maybe 
even several outputs, or at least an output of a type different from all inputs. 
Moreover, one may want to be able to transform subtypes of the types of input 
and output graphs. In order to reach such an extra flexibility in the typing of 
graph transformations we introduce in this section the transformation of tuples 
of typed graphs, which is the most basic operation of the typed graph transfor- 
mation units presented in Section 4. 
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Graph tuple transformation over a base type is an extension of ordinary 
rule application in the sense that graphs of different types can be transformed 
in parallel. For example to check whether some string can be recognized by a 
deterministic finite automaton, one can transform three graphs in parallel: The 
first graph is a string graph representing the string to be recognized, the second 
graph is a deterministic finite state graph the current state of which is the start 
state, and the third graph represents the boolean value false. To recognize the 
string one applies a sequence of typed rule applications which consume the string 
graph while the corresponding transitions of the deterministic finite state graph 
are traversed. If after reading the whole string the current state is a final state, 
the third graph is transformed into a graph representing true. This example will 
be explicitly modelled in Section 4. 

In graph tuple transformation, tuples of rules are applied to tuples of graphs. 
A tuple of rules may also contain the symbol — in some components where no 
change is desired. The graphs and the rules are taken from a base type , which 
is a tuple of rule bases BT = (Bi , . . . , B n ). Let (Gi, . . . , G n ) and (Hi , . . . , H n ) 
be graph tuples over BT, i.e. Gi, Hi £ Qbi for i = 1, . . . , n. Let a = (ai, . . . , a n ) 
with a* £ IZBi or a* = — for * = 1, . . ,,n. Then (Gi, . . . , G n ) — >(H i, . . . , H n ) if 

a 

for * = 1, . . . ,n, Gi =>■ Hi if dj £ IZBi and Gj = Hi if a, = — . In the following 

CLi 

we call a a basic action of BT. 1 For a set ACT of basic actions of BT , — > 

ACT 

denotes the union I I, — >, and — > its reflexive and transitive closure. 

a ACT 

For example, let I be some finite alphabet and let B b ff™ g = ( string , { read(x ) | 
x £ /}, =>) and B^f g c = ( dfsg , { go(x ) | x £ /}, =>) be two rule bases such that 
string consists of all string graphs over / and dfsg consists of all deterministic 
finite state graphs over I. Let G\ be the string graph representing aaba and let Gn 
be the above deterministic finite state graph. Then (Gi, Gn) — >(Hi,H 2 ) for the 

a 

basic action a = (read (a), go (a)) of base type (B b s ffff g , B^f g c ) if Hi represents 
aba and H 2 is obtained from G 2 by redirecting the current-edge to the start 
state. 

Let ACT be the set of all basic actions of BT = (B i, . . . , B n ). Then obviously 

the following holds: (Gi, . . . , G n ) — A (Hi, . . . , H n ) if and only if Gi ==>• Hi for 

ACT 

i = 1, ... ,n. This means that the transformation of graph tuples via a sequence 
of basic actions is equivalent to the transformation of tuples of typed graphs 
where every component is transformed independently with a sequence of direct 
derivations of the corresponding type. 

In [9] the transformation of tuples of typed graphs is generalized in the sense 
that the transformations are performed by a product of transformation units 
instead of tuples of rules. In every transformation step of a product of transfor- 
mation units tui , . . . , tu n , one can apply rules as well as imported transformation 
units of tui to the ft li graph in the current graph tuple (i = 1 , ... ,n). Hence, in 



1 More sophisticated actions with more expressive power will be introduced further 



on. 
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every transformation step the graphs of the current graph tuple are transformed 
in parallel. Such a transformation step in a product unit is called an action. The 
nondeterminism of actions is restricted by the control conditions and the graph 
class expressions of the units tu \, . . . , tu n . Moreover one can specify control con- 
ditions on the level of actions. In order to get a flexible kind of typing, i.e. to 
declare a sequence of input components and a sequence of output components in- 
dependently, the embedding and projection of graph products is introduced. For 
the same reasons, similar operations will be introduced for typed transformation 
units in this paper. The most striking difference of the product of transformation 
units in [9] and the typed graph transformation units presented here is the im- 
port component. Typed units can import other typed units whereas a product of 
transformation units is composed of transformation units in the original sense, 
i.e. it does not use other typed units. 

3 Restricting the Nondeterminism 

Application of rule tuples is highly nondeterministic in general. For many appli- 
cations of graph transformation it is meaningful to restrict the number of possible 
ways to proceed with a transformation process. Hence, in order to employ typed 
transformation units meaningfully, they are equipped with graph tuple class ex- 
pressions and control conditions to restrict the number of possible sequences of 
transformation steps. 

3.1 Graph Tuple Class Expressions 

The aim of graph tuple class expressions is to restrict the class of graph tuples 
to which certain transformation steps may be applied, or to filter out a subclass 
of all the graph tuples that can be obtained from a transformation process. 
Typically, a graph tuple class expression may be some logic formula describing 
a tuple of graph properties like connectivity, or acyclicity, or the occurrence or 
absence of certain labels. In this sense, every graph tuple class expression e over 
a base type BT = (Bi, . . . , B n ) specifies a set SEM(e) C Q Bl x ••• x Q Bn of 
graph tuples in BT. 

In many cases such a graph tuple class expression will be a tuple e = 
(ei, . . . ,e„) where the itlr item e, restricts the graph class Q Bi of the rule base 
I j, . i.e. SEM Bi (ei) C Q B . for i = 1, . ,,,n. Consequently, the semantics of e is 
SEM Bl (ei) x • • • x SEM Bn (e„). Hence, each item e* is a graph class expression 
as defined for transformation units without explicit typing. 

The graph tuple class expressions used in this paper are also tuples of graph 
class expressions. A simple example of a graph class expression is all which 
specifies for any rule base B the graph type of B, i.e. SEM B (all) = Q B . Conse- 
quently, the graph tuple class expression (ei, . . . , e n ) with e* = all for i = 1, . . . , n 
does not restrict the graph types of the rule bases, i.e. SEM((e i,...,e„)) = 
Q Bl x • • • x Q Bn . Another example of a graph class expression over a rule base B 
is a set of graphs in Q B . The semantics of a set e C Q B is e itself. In particular 
we will use the set initialized consisting of all deterministic finite state graphs 
the current state of which is the start state. 
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3.2 Control Conditions 

A control condition is an expression that determines, for example, the order 
in which transformation steps may be applied to graph tuples. Semantically, it 
relates tuples of start graphs with tuples of graphs that result from an admitted 
transformation process. In this sense, every control condition C over a base 
type BT specifies a binary relation SEM[C) on the set of graph tuples in BT. 
More precisely, for a base type BT = (£?i, . . . , B n ) SEM(C) is a subset of 
{Qb 1 x • • • x g Bn ) 2 - 

As control condition we use in particular actions, sequential composition, 
union, and iteration of control conditions, as well as the expression as-long-as- 
possible (abbreviated with the symbol !). An action prescribes which rules or 
imported typed units should be applied to a graph tuple, i.e. an action is a control 
condition that allows one to synchronize different transformation steps. The 
basic actions of the previous section are examples of actions. Roughly speaking, 
an action over a base type BT = {B \, . . . , B n ) is a tuple act = (ai,...,a n ) 
that specifies an n, n-relation SEM(act ) C (Gb 1 x • • • x f/s n ) 2 . Actions will be 
explained in detail in Section 4. 

In particular, an action act is a control condition that specifies the relation 
SEM(act). For control conditions C, C'i, and C 2 the expression C\\C 2 spec- 
ifies the sequential composition of both sematic relations, C\ \ C 2 specifies the 
union, and C* specifies the reflexive and transitive closure, i.e. SEM{C\\C 2 ) = 
SEM(Ci) o SEM(C 2 ), SEM(Ci\C 2 ) = SEM{C 1 ) U SEM(C 2 ), and SEM(C*) = 
SEM(C)* . Moreover, for a control condition C the expression C ! requires to ap- 
ply C as long as possible, i.e. SEM(C) consists of all pairs (G, If) G SEM{C)* 
such that there is no H' with ( H,H ') £ SEM[C). In the following the control 
condition C\ \ ■ ■ ■ \C n will also be denoted by {C'i, . . . , C n }. 

For example, let C'i, C 2 , and C 3 be control conditions that specify n, n- 
relations on graphs of different types. Then the expression C'i!; C|; {Cz\C{) pre- 
scribes to apply first C'i as long as possible, then C 2 arbitrarily often, and finally 
C 3 or C'i exactly once. 

4 Typed Graph Transformation Units 

Typed transformation units provide a means to structure the transformation 
process from a sequence of typed input graphs to a sequence of typed output 
graphs. More precisely, a typed graph transformation unit transforms fc-tuples of 
graphs into /-tuples of graphs such that the graphs in the fc-tuples as well as the 
graphs in the /-tuples may be of different types. Hence, a typed transformation 
unit specifies a k, /-relation on typed graphs. Internally a typed transformation 
unit transforms n-tuples of typed graphs into n-tuples of typed graphs, i.e. it 
specifies internally an n, n-relation on typed graphs. The transformation of the 
n-tuples is performed according to a base type which is specified in the decla- 
ration part of the unit. The k, /-relation is obtained from the n, n-relation by 
embedding k input graphs into n initial graphs and by projecting n terminal 
graphs onto / output graphs. The embedding and the projection are also given 
in the declaration part of a typed unit. 
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4.1 Syntax of Typed Graph Transformation Units 

Base types, graph tuple class expressions, and control conditions form the in- 
gredients of typed graph transformation units. Moreover, the structuring of the 
transformation process is achieved by an import component, i.e. every typed 
unit may import a set of other typed units. The transformations offered by an 
imported typed unit can be used in the transformation process of the importing 
typed unit. 

The basic operation of a typed transformation unit is the application of an 
action, which is a transformation step from one graph tuple into another where 
every component of the tuple is modified either by means of a rule application, or 
is set to some output graph of some imported typed unit, or remains unchanged. 
Since action application is nondeterministic in general, a transformation unit 
contains a control condition that may regulate the graph tuple transformation 
process. Moreover, a typed unit contains an initial graph tuple class expression 
and a terminal graph tuple class expression. The former specifies all possible 
graph tuples a transformation may start with and the latter specifies all graph 
tuples a transformation may end with. Hence, every transformation of an n- 
tuple of typed graphs with action sequences has to take into account the control 
condition of the typed unit as well as the initial and terminal graph tuple class 
expressions. 

A tuple of sets of typed rules, a set of imported typed units, a control con- 
dition, an initial graph tuple class expression, and a terminal graph tuple class 
expression form the body of a typed transformation unit. All components in the 
body must be consistent with the base type of the unit. 

Formally, let BT = (B i, . . . , B n ) be a base type. A typed graph transforma- 
tion unit tgtu with base type BT is a pair ( decl , body) where decl is the declaration 
part of tgtu and body is the body of tgtu. The declaration part is of the form 
in — * out on BT where in: [fc] — > [n] and out: [/] — * [n] are mappings with 
k,l € N. 2 The body of tgtu is a system body = (I,U, R,C,T) where I and T 
are graph tuple class expressions over BT, U is a set of imported typed graph 
transformation units, R is a tuple of rule sets (Ri , . . . , R n ) such that Ri CK B . 
for i = 1, . . . , n, and C is a control condition over BT. The numbers k and l 
of tgtu are also denoted by kt g tu and lt g tu- Moreover, the iih input type 
of tgtu is also denoted by intype tgtu (i) for i = 1, . . . , k and the jth output type 
Qb ou ty) by outtype tgtu {j) for j = 1, . . . , l. 

To simplify technicalities, we assume in this first approach that the import 
structure is acyclic (for a study of cyclic imports of transformation units with a 
single input and output type see [13]). Initially, one builds typed units of level 
0 with empty import. Then typed units of level 1 are those that import only 
typed units of level 0, and typed units of level n + 1 import only typed units of 
level 0 to level n, but at least one from level n. 



For a natural number n £ N, [n] denotes the set {1, . . . , n}. 



2 
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4.2 Examples for Typed Graph Transformation Units 

Example 1. The base type of the following example of a typed transformation 
unit is the tuple ( L3 S f rlrig . Bdfsg^ E^ool ) . The rule base B^f rlrig is ( string . { read (x) | 
x £ 1} U {is- empty}, =>), where the rule is- empty checks whether the graph to 
which it is applied represents the empty string. It has equal left- and right-hand 
sides consisting of a node to which a begin- and from which an end-edge are 
pointing. 

begin end begin end 

is-empty : • 

12 3 12 3 

The rule base Bdf sg is {dfsg , {go{x) \ x £ 1} U {is- final} , =>) . The rule 
is-final checks whether the current state of a deterministic finite state graph is 
a final state, resetting it to the start state in that case, and can be depicted as 
follows. 

start final start final 

* 

1 



The rule base B^ooi contains the graph type bool which consists of the two 
graphs TRUE and FALSE , where TRUE represents the value true and FALSE 
the value false. Both graphs consist of a single node with a loop that is labelled 
true and false, respectively: 

TRUE = •^UDtrue FALSE = false 

The rule type of Bb 00 i consists of the four rules 

set-to-true : *so false ::— true is-true: true ■ ■= *0 true 

1 J 1 1 1 

set-to-false: »0 true '■'■= false is-false: *0 false ::= false 

where set-to-true changes a false- loop into a trite-loop, set-to-false does the 
same the other way round, is-true checks whether a graph of type bool is equal 
to TRUE, and is-false checks the same for FALSE. 

Now we can define the typed unit recognize shown in Figure 1. It has as 
input graphs a string graph and a deterministic finite state graph and as output 
graph a boolean value. The mapping in of the declaration part of recognize 
is defined by in: [2] — > [3] with in( 1) = 1 and in{ 2) = 2. We use the more 
intuitive tuple notation ( string , dfsg, — ) for this. The mapping out is denoted by 
, bool) which means that out: [1] — > [3] is defined by out{ 1) = 3. Hence, 
intype recogmze { 1 ) = string, intype recognize (2) = dfsg, and outtype recognize { 1 ) = 
bool. 

The initial graph tuple class expression is ( string , initialized, FALSE), i.e. it 
admits all tuples (Gi, G 2 , G 3 ) £ string x dfsg x bool where the current-edge of G 2 




is-final : 
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recognize 




decl: 


{string, dfsg, -) — > (-, -, bool) on ( B str ing , B d f sg , Bbooi) 


initial: 


{string, initialized, FALSE) 


rules: 


{B-B string , U Bdfsg , {set-to-true}) 


cond: 


ad; <22! where 




ai = {{read{x), go{x), — ) | x £ 1} and 




d 2 = {is- empty , is- final, set- to- true) 


terminal: 


{string, dfsg, bool) 



Fig. 1. A typed unit with empty import. 



points to the start state and G3 is equal to FALSE. The rules are restricted to 
the tuple {n BatrmlJ , TZ-Bdfsg , {set-to-true}), i.e. just one rule from Bbooi is admitted. 
The control condition requires to apply first the action cq as long as possible 
and then the action 02 as long as possible, where aq applies read{ x) to the first 
component of the current graph tuple and go{x ) to the second component (for 
any x £ I). The action 02 sets the third component to TRUE if the current 
string is empty, the current state of the state graph is a final state, and the 
third component is equal to FALSE. Note that 02 can be applied at most once 
because of set-to-true, and only in the case where cq cannot be applied anymore 
because of is- empty. The terminal graph tuple class expression does not restrict 
the graph types of the base type, i.e. it is equal to ( string , dfsg, bool). The unit 
recognize does not import other typed units. 

Example 2. The unit recognize- intersection shown in Figure 2 is an example of a 
typed unit with a non-empty import component. It has as input graphs a string 
graph and two deterministic finite state graphs. The output graph represents 
again a boolean value. The base type of recognize-intersection is the six-tuple 
{B string, B d f sg ,B d fsg,B boo i, B bo oi, B boo i). The mapping in of the declaration part 
requires to take a string graph from the first rule base of the base type, one 
deterministic finite state graph from the second and one from the third rule base 
as input graphs. The mapping out requires to take a graph from the last rule 
base as output graph. 



recognize-intersection 


decl: 


( string , dfsg , dfsg , -) -► (-, bool) on 

( Bstring 5 Bdfsg 5 Bdfsg ? Bbooi 5 Bbooi ■> Bbooi ) 


initial: 


{string, dfsg, dfsg, bool, bool, FALSE) 


uses: 


recognize 


rules: 


(0, 0, 0, {is-true}, {is-true}, {set-to-true}) 


cond: 


aqcq! where 

ai = recognize {1, 2 ), recognize { 1 , 3 ), — ) and 

a2 = (— , — , — , is-true, is-true, set-to-true) 


terminal: 


{string, dfsg, dfsg, bool, bool, bool) 



Fig. 2. A typed unit with imported units combined in an action. 
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The unit recognize- inter section imports the above unit recognize and has as 
local rules is-true and set-to-true where is-true can be applied to the fourth 
and the fifth component of the current graph tuples and set-to-true to the sixth 
component. The control condition requires the following. 

1. Apply recognize to the first and the second component and write the result 
into the fourth component and 

2. apply recognize to the first and the third component and write the result 
into the fifth component. 

3. If then possible apply the rule is-true to the fourth and the fifth component 
and the rule set-to- true to the sixth component. 

This means that in the first point recognize is applied to the input string graph 
and the first one of the input deterministic finite state graphs. In the second point 
recognize must be applied to the input string graph and to the second determin- 
istic finite state graph. These two transformations can be performed in parallel 
within one and the same action denoted by the tuple (—,—,—, recognize (1,2), 
recognize(l, 3), — ). (The precise semantics of this action will be given in the next 
subsection where actions and their semantics are introduced formally.) The rule 
application performed in the third point corresponds to applying the basic ac- 
tion is-true, is-true, set-to-true) as long as possible. Since the initial 

graph tuple class expression requires that the sixth graph represent false, this 
means at most one application due to set-to-true. The terminal graph tuple class 
expression admits all graph tuples of the base type. 

Example 3. Let I be the alphabet consisting of the symbols a, b, let L, L a , L b be 
regular languages, and let subst: I — » V(I*) be a substitution with subst(a) = L a 
and substfb) = Lb. The aim of the following example is to model the recognition 
of the substitution language subst (L) = {subst(w) \ w £ L} based on a descrip- 
tion of L,L a ,Lb by deterministic finite automata. (The model can of course be 
extended to arbitrarily large alphabets.) 

First, consider the typed unit reduce shown in Figure 3. It takes a string 
graph and a deterministic finite state graph as input, requiring through the 
initial component that the state graph be in its start state. It then reduces the 
string graph by arbitrarily often applying actions of the form (read (x), go (x)), 
i.e. by consuming an arbitrarily large prefix of the string and changing states 



reduce 




deck 


(string, dfsg) — > (string,-) on (B atrin g, B d j ag ) 


initial: 


(string, initialized) 


rules: 


(B-B string ■ K Bdfsg ) 


cond: 


di*; a 2 where 




oi = {(read(x),go(x)) \ x G 1} and 




02 — (— , is- final) 


terminal: 


(string, dfsg) 



Fig. 3. A typed unit that returns a modified input graph as output. 
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accordingly in the state graph, and returns the residue of the string graph as 
output, but only if the consumed prefix is recognized by the state graph, i.e. 
only if the action (— , is- final) is applied exactly once. 



recognize- substitution 


decl: 


( string , dfsg, dfsg , dfsg, -) -► (-, bool) on 

(B string J B (ff S g , B dfsg , B dfsg , Bfjool ) 


initial: 


[string, initialized , initialized, initialized , FALSE) 


uses: 


reduce 


rules: 


[{is-empty}, R-B ifsg , 0, 0, {set-to-true}) 


cond: 


(ai|a 2 )*; a 3 where 

ai = {[reduce [1, 3), go[a), 

02 = {[reduce[ 1, 4), go[b), — , — , — ), and 

03 = [is-empty, is-final, — , — , set-to-true) 


terminal: 


[string, dfsg, dfsg, bool, bool, bool) 



Fig. 4. A typed unit with imported units combined in an action. 



The typed unit recognize- substitution shown in Figure 4 makes use of reduce 
in order to decide whether an input string graph is in the substitution lan- 
guage given as further input by three deterministic finite state graphs A , A a , A/, 
that define L,L a ,Lb, in that order. Initially, the state graphs must once again 
be in their respective start states and the value in the output component is 
false. The idea is to guess, symbol by symbol, a string w € L such that the 
input string is in substfw). If the next symbol is guessed to be a, the action 
[reduce[l, 3), go[a), — , — , — ) is applied that runs A a to delete a prefix belong- 
ing to L a from the input string [reduce} 1,3)) and simultaneously executes the 
next state transition for a in A ( go[a )). The action [reduce[l, 4), go[b), — , — , — ) 
works analogously for the symbol b. Thus, recognize- substitution is an example 
of a typed unit that combines an imported unit ( reduce ) and a rule ( go[x )) in an 
action. Finally, a mandatory application of the action ( is-empty , is-final, — , — , 
set-to-true ) produces the output value true, but only if the input string is com- 
pletely consumed and A is in some final state. 

It may be noted that even though the finite state graphs are deterministic, 
there are two sources of nondeterminism in this model: The symbols of the 
supposed string w G L must be guessed as well as a prefix of the input string 
for each such symbol. Consequently, the model admits only tuples with output 
TRUE in its semantics. 

4.3 Semantics of Typed Graph Transformation Units 

Typed transformation units transform initial graph tuples to terminal graph 
tuples by applying a sequence of actions so that the control condition is satisfied. 
Moreover, the mappings in and out of the declaration part prescribe for every 
such transformation the input and output graph tuples of the unit. Hence, the 
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semantics of a typed transformation unit can be defined as a k , ^-relation between 
input and output graphs. 

Let tgtu = (in — •> out on BT, (I, U, R, C, T)) be a typed transformation unit 
with BT = (Bi , . . . , B n ), in: [fc] — » [n], out: [?] — > [n], and R = (Ri , . . ■ , R n ). 
If U = 0, tgtu transforms internally a tuple Gefe, x-'X Q Bn into a tuple 
H £ Qb 1 x • • • x Qs n if and only if 

1. G is an initial graph tuple and H is a terminal graph tuple, i.e. (G, H) £ 

SEM(I) x SEM(T ); 

2. H is obtained from G via a sequence of basic actions over (R\, . . . , R n )i i.e. 

G — -> H where ACT(tqtu) is the set of all basic actions a = (ai, . . . , a n ) 

ACT(tgtu) 

of BT such that for j = 1, . . . , n, a, £ Ri if ai ^ — , and 

3. the pair ( G,H ) is allowed by the control condition, i.e. ( G,H ) £ SEM(C). 

If the transformation unit tgtu has a non-empty import, the imported units can 
also be applied in a transformation from G to H . This requires that we extend 
the notion of basic actions so that calls of imported typed units are allowed, 
leading to the notion of (general) actions. 

Formally, an action of tgtu is a tuple a = (ai, . . . , a n ) such that for i = 
l,...,nwe have a* £ Ri, or a* = — , or a* is of the form ( u , input, output ) where 
u £ U, input: [k u \ — *■ [n] with QB inputu) Q intype u (j) for j = 1 ,...,k u , and 
output £ [l u ] with outtype u (output ) C Q B .. In the latter case, we denote dj by 
u(input( 1), . . . , input (k u )) (output) , and shorter by u(input( 1), . . . , input(k u )) if 
u has a unique output, i.e. l u = 1 = output. 

The application of an action a = (ai, . . . , a n ) to a current graph tuple of n 
typed graphs works as follows: As for typed rule application, if a, is a rule of Ri, it 
is applied to the zth graph. If a, is equal to — , the ztlr graph remains unchanged. 
The new aspect is the third case where a, is of the form (u, input, output). In 
this case, the mapping input : [k u ] — » [n] determines which graphs of the current 
tuple of typed graphs should be chosen as input for the imported unit u. The 
output output £ [l u ] specifies which component of the computed output graph 
tuple of u should be assigned to the zth component of the graph tuple obtained 
from applying the typed unit u to the input graphs selected by input. 

For example the action (— , — , — , recognize(l, 2), recognize (1, 3), — ) of the 
typed unit recognize- inter section has as semantics every pair ((G\, . . . ,Gq), 
(H i, . . . , Hq)) such that Gj = Hi for z £ {1, 2, 3, 6}, is the output of recognize 
applied to (Gi,G 2 ), and H 5 is the output of recognize applied to (Gi,Gs). 

Formally, assume that every imported typed unit u of tgtu defines a semantic 
relation 

SEM(u) C (intype u ( 1) x • • • x intype u (k u )) x (outtype u ( 1) x • • • x outtype u (l u )). 

Then every pair ((Gi, . . . , G n ), (Hi , . . . , H n )) of graph tuples over BT is in the 
semantics of an action a = (a\, . . . , a n ) of tgtu if for z = 1 , . . . , n: 

Gi v Hi if o , £ Ri, 

CLi 

— Gi = Hi if a* = — , and 
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- Hi = H' output if Hi = (u, input, output) and {{G input{1 ), . . . ,G input(ku )), 

SEM(u). 

The set of all actions of tgtu is denoted by ACT (tgtu) and the semantics of an 
action a G ACT (tgtu ) by SEM(a). 

Now we can define the semantics of tgtu as follows. Every pair ((Gi, . . . , Gk ), 
(Hi, . . . , Hi)) is in SEM (tgtu) if there is a pair (G, H) with G = (Gi, . . . , G n ), 
H = (Hi,..., H n ) such that the following holds. 

- (Gi, . . . , Gk) = (Gi n ( i), • • • , Gi„(fe)), 

(-^1 }_•••; Hi) (-^cmt(l) , , H ou i p) ) , 

- (G, H) G (SEM (I) x SEM(T)) n SEM(C), 

(G,H) G (U aeA CT(t g tu) SEM (a))*. 

For example, the semantics of the typed unit recognize consists of all pairs 
of the form ((Gi, G2), (H)) where Gi is a string graph, G2 is a deterministic 
finite state graph with its start state as current state, and H = TRUE if Gi 
is recognized by G2; otherwise H = FALSE. The semantics of the typed unit 
recognize- inter section consists of every pair ((Gi, G2, G3), (H)) where Gi is a 
string graph, G2 and G3 are deterministic finite state graphs with their respec- 
tive start state as current state, and H = TRUE if G\ is recognized by G2 and 
G3; otherwise H = FALSE. The semantics of the typed unit reduce contains 
all pairs ((Gi, G2), (G3)) where Gi and G3 are string graphs and G2 is a de- 
terministic finite state graph with its start state as current state such that G3 
represents some suffix of the string represented by G 1 and G2 recognizes the cor- 
responding “prefix” of Gi. The semantics of recognize- substitution contains all 
pairs ((Gi, G2, G3, G4), (TRUE)) where Gi represents a string in the substitu- 
tion language subst(L), G2 recognizes the language L, and G3 and G4 recognize 
the languages subst(a) and subst(b), respectively. 

5 Conclusion 

In this paper, we have introduced the new concept of typed graph transformation 
units, which is helpful to specify structured and parallel graph transformations 
with a flexible typing. To this aim a typed transformation unit contains an import 
component which consists of a set of other typed transformation units. The 
semantic relations offered by the imported typed units are used by the importing 
unit. The nondeterminism inherent to rule-based graph transformation can be 
reduced with control conditions and graph tuple class expressions. 

Typed transformation units are a generalization of transformation units [10] 
in the following aspects. (1) Whereas a transformation unit specifies a binary re- 
lation on a single graph type, a typed transformation unit specifies a k, /-relation 
of graphs of different types. (2) The transformation process in transformation 
units is basically sequential whereas in typed transformation units typed graphs 
are transformed simultaneously. Moreover, as described in Section 2.2 typed 
transformation units generalize the concept of product units [9] that also spec- 
ify k, /-relations of typed graphs. With product units, however, the possibilities 
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of structuring (and modelling) are more restrictive in the sense that only rules 
and transformation units can be applied to graph tuples but no imported typed 
transformation unit. 

Further investigation of typed transformation units may concern the following 
aspects. (1) We used graph-transformational versions of the truth values, but one 
may like to combine graph types directly with arbitrary abstract data types, 
i.e. without previously modelling the abstract data types as graphs. (2) In the 
presented definition, we consider acyclic import structures. Their generalization 
to networks of typed transformation units with an arbitrary import structure is 
an interesting task. (3) In the presented approach the graphs of the tuples do 
not share common parts. Hence, one could consider graph tuple transformation 
where some relations (like morphisms) can be explicitly specified between the 
different graphs of a tuple. (4) Apart from generalizing the concept of typed 
transformation units, a comparison with similar concepts such as pair grammars 
[14] and triple grammars [16] is needed. (5) Finally, case studies of typed units 
should also be worked out that allow to get experience with the usefulness of the 
concept for the modelling of (data-processing) systems and systems from other 
application areas. 
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Abstract. Graph programs as introduced by Habel and Plump [8] pro- 
vide a simple yet computationally complete language for computing func- 
tions and relations on graphs. We extend this language such that numer- 
ical computations on labels can be conveniently expressed. Rather than 
resorting to some kind of attributed graph transformation, we introduce 
conditional rule schemata which are instantiated to (conditional) double- 
pushout rules over ordinary graphs. A guiding principle in our language 
extension is syntactic and semantic simplicity. As a case study for the 
use of extended graph programs, we present and analyse two versions 
of Dijkstra’s shortest path algorithm. The first program consists of just 
three rule schemata and is easily proved to be correct but can be ex- 
ponential in the number of rule applications. The second program is a 
refinement of the first which is essentially deterministic and uses at most 
a quadratic number of rule applications. 



1 Introduction 

The graph transformation language introduced by Habel and Plump in [8] and 
later simplified in [7] consists of just three programming constructs: nondeter- 
ministic application of a set of rules (in the double-pushout approach) either in 
one step or as long as possible, and sequential composition. The language has a 
simple formal semantics and is both computationally complete and minimal [7]. 
These properties are attractive for formal reasoning on programs, but the price 
for simplicity is a lack of programming comfort. 

This paper is the first step in developing the language of [7] to a programming 
language GP (for graph programs) that is usable in practice. The goal is to design 
- and ultimately implement - a semantics-based language that allows high-level 
problem solving by graph transformation. We believe that such a language will be 
amenable to formal reasoning if programs can be mapped to a core language with 
a simple formal semantics. Also, graphs and graph transformations naturally lend 
themselves to visualisation which will facilitate the understanding of programs. 

The language of [7] has no built-in data types so that, for example, numerical 
computations on labels must be encoded in a clumsy way. We therefore extend 
graph programs such that operations on labels are performed in a predefined 
algebra. Syntactically, programs are based on rule schemata labelled with terms 
over the algebra, which prior to their application are instantiated to ordinary 
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double-pushout rules. In this way we can rely on the well-researched double- 
pushout approach to graph transformation [2, 6] and avoid resorting to some kind 
of attributed graph transformation. We also introduce conditional rule schemata 
which are rule schemata equipped with a Boolean term over operation symbols 
and a special edge predicate. This allows to control rule schema applications by 
comparing values of labels and checking the (non-)existence of edges. 

To find out what constructs should be added to the language of [7] to make 
GP practical, we intend to carry out various case studies. Graph algorithms are 
a natural choice for the field of such a study because the problem domain need 
not be encoded and there exists a comprehensive literature on graph algorithms. 
In Section 7 we present and analyse two graph programs for Dijkstra’s shortest 
path algorithm. The first program contains just three rule schemata but can be 
inefficient, while the second program is closer to Dijkstra’s original algorithm and 
needs at most a quadratic number of rule applications. We prove the correctness 
of the first program and the quadratic complexity of the second program to 
demonstrate how one can formally reason on graph programs. 

In general, we want to keep the syntax and semantics of GP as simple a pos- 
sible while simultaneously providing sufficient programming comfort. Of course 
there is a trade-off between these aims; for example, we found it necessary to 
introduce a while loop in order to efficiently code Dijkstra’s algorithm in the 
second program. 



2 Preliminaries 

A signature E = ( S , OP) consists of a set S of sorts and a family OP = 
(OPs jS )s£S*,sgS of operation symbols. A family X = ( X s ) se s of variables consists 
of sets X s that are pairwise disjoint and disjoint with OP. The sets Top tS (X) 
of terms of sort s are defined by x,c € Top tS {X) for all x £ X s and all 
c € OP\ >s , and op{t\, . . . , t n ) € Top, s (X) for all op £ OP Sl ... Sn , s and all fi £ 
Top, Si {X), . . . ,tn € Top, Sn {X). The set of all terms over E and X is denoted 
by T e (X). 

A E-algebra A consists of a family of nonempty sets (A s ) se s, elements ca £ 
A s for all c £ OP\ iS , and functions opA- A Sl x ... x A Sn — > A s for all op £ 

OP Sl .., n , s . 

An assignment a: X — » A is a family of mappings (a s : X s — > A s ) se g. The ex- 
tension a: T E {X) — > A of a is defined by a{ x) = a(x) and d(c) = ca for all vari- 
ables x and all constant symbols c, and a(op(t \, . . . , t n )) = opA(a(ti ), . . . , a(t n )) 
for all op{t\, . . . , t n ) £ Ts(X). If t is a variable-free term, then a(t) is denoted 
by t A . 

A label alphabet is a pair C = ( Cv,Ce ), where Cy is a set of node labels 
and Ce is a set of edge labels. A partially labelled graph over C is a system 
G = (Vg, Eg, SG,tG,lG,v ,I g,e), where Vq and Eg are finite sets of nodes and 
edges, sgAg- Eq — > Vg are source and target functions for edges, lay - Vg — ► Cy 
is the partial node labelling function and Ig,e-Eq — » Ce is the partial edge 
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labelling function 1 . A graph is totally labelled if lay and Iq,e are total functions. 
We write Q(C) for the set of partially labelled graphs, and for the set of 

totally labelled graphs over C. 

A premorphism g: G —* H between two graphs G and H consists of two 
source and target preserving functions gy: Vq —■ > Vh and gE ■ Eg — > Eh, that 
is, sh o gs = gv ° sg and tn ° gE = gv ° to- If g also preserves labels in the 
sense that ^(^(n)) = lain) for all n in Donate, v) and Donate, e), then it is a 
graph morphism. Moreover, g is injective if gv and g R are injective, and it is an 
inclusion if g(n) = n for all nodes and edges n in G. 

Assumption 1 We assume a signature £ = (S, OP) such that Bool £ S, 
OP\ ,Bool {true, false}, OP Bool 

,Bool {->} and OPbooibooi ,Bool (A) V, — > 

,<->}. The signature is interpreted in a fixed A-algebra A such that A Bool = 
{tt,ff}, true^ = tt, falser = ff and ~^a, Aa, Va, — > A, are the usual 

Boolean operations. We also assume a family of variables X = (X s ) s€ s and that 
S contains two distinguished sorts sy and se for nodes and edges. The label 
alphabets Ct and Ca are defined by 

Ct = {Top.s v (X),Top,s e (X)) and Ca = {A sv , A SE ). 

3 Rules and Rule Schemata 

We recall the definition of double-puslrout rules with relabelling given in [9], 
before introducing rule schemata over G(Ct). 

Definition 1 (Rule). A rule r = (L <— K — > R) consists of two graph mor- 
phisms K —> L and b: K — > R over Q(Ca) such that K — > L is an inclusion 
and 

(1) for all n £ L, l L (n ) =jL implies n £ K and l R (b(n)) =J_, and 

(2) for all n £ R, Ir(ti) =_L implies ^(n') =J_ for exactly one n' £ & _1 (n). 

The rule r is injective if b: I\ — > R is injective. All rules in the graph programs 
for Dijkstra’s algorithm in Section 7 will be injective, but in general we want to 
allow non- injective rules. 

Definition 2 (Direct derivation). Let G and H be graphs in C/IC-a) and 
r = (L <— K — * R) a rule. A direct derivation from G to H by r consists of two 
natural puslrouts 2 as in Figure 1, where L — ■> G is injective. 

We write G => r ,g H or just G => r H if there exists a direct derivation as in 
Definition 2. ff 1Z is a set of rules, then G =>n H means that there is some r 
in 1Z such that G => r H. Figure 2 shows an example of a rule where we assume 
A sv = A se = R. (In pictures like this, numbers next to the nodes are used to 
represent graph morphisms.) 

1 Given a partial function /: A — ► B, the set Dom(/) = {x £ A | f(x) is defined} is 
the domain of /. We write f(x) =T if f(x) is undefined. 

2 A pushout is natural if it is also a pullback. See [9] for the construction of natural 
pushouts over partially labelled graphs. 
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L ■* K ► R 



G*+ D ► H 

Fig. 1. A direct derivation. 

Definition 3 (Match). Given a rule r = (L <— K — * R) and a graph G in 
©(Ca)j an injective graph morphism <?: L — > G is a match for r if it satisfies the 
dangling condition : no node in g(L) — g(K) is incident to an edge in G — g(L). 

In [9] it is shown that, given r and an injective morphism g: L — > G , there 
exists a direct derivation as in Figure 1 if and only if g is a match for r. Moreover, 
in this case D and H are determined uniquely up to isomorphism. 

Definition 4 (Rule schema). If K —> L and K — » II are graph morphisms 
over Q{Ct) satisfying the conditions of Definition 1, then r = (L <— K — > R) is 
a rule schema. 

An example of a rule schema is shown in Figure 3, where x,y and z are 
variables of sort Real. 

(EM© - O O - ©-ME) 

12 12 12 

Fig. 2. A rule. 

©ME) ~ © O — (EM© 

12 12 12 

Fig. 3. A rule schema. 



Rule schemata are instantiated by evaluating their terms according to some 
assignment a: X — > A. 

Definition 5 (Instances of graphs and rule schemata). Given a graph G 
over Ct and an assignment a: X — > A, the instance G a of G is the graph over Ca 
obtained from G by replacing the labelling functions Iq with Stole- The instance 
of a rule schema r = (L <— K — > R) is the rule r a = ( L a <— I\ a — » R a ). 

For example, the rule in Figure 2 is an instance of the rule schema in Figure 3; 
the associated assignment a satisfies a(x) = 1, a(y) = 2 and a(z) = 4. Note 
that a rule schema may have infinitely many instances if A contains infinite base 
sets. 
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Given graphs G and H in Q t (CA) and a rule schema r, we write G => r H 
if there is an assignment a such that G => r a H. For a set 1Z of rule schemata, 
G =S> 7 1 H means that there is some r in 7Z such that G => r H. 

4 Conditional Rules and Conditional Rule Schemata 

We introduce conditional rule schemata which allow to control the application of 
a rule schema by comparing values of terms in the left-hand side of the schema. 
This concept will be crucial to express graph algorithms conveniently. 

Analogously to the instantiation of rule schemata to rules, conditional rule 
schemata will be instantiated to conditional rules. We define a conditional rule 
as a rule together with a set of admissible matches. 

Definition 6 (Conditional rule). A conditional rule q = ( r,M ) consists of 
a rule r = (L <— K — > R) and a set M of graph morplrisms such that M C 
{g: L — > G \ G G G*(Ca) and g is a match for r}. 

Intuitively, M is a predicate on the matches of r in totally labelled graphs. 
Given a conditional rule q = ( r,M ) and graphs G and H in ^ICa), we write 
G => q H if there is a morphism g in M such that G => r ,g H. 

Our concept of a conditional rule is similar to that of [5] where rules are 
equipped with two sets of morplrisms (representing positive and negative applica- 
tion conditions, respectively). Because [5] is based on the so-called single-pushout 
approach, admissible morplrisms need not satisfy the dangling condition. 

Conditional rules as defined above are a semantic concept in that the set M 
of admissible matches will usually be infinite. To represent conditional rules in 
the syntax of a programming language, we introduce conditional rule schemata 
which consist of a rule schema and a Boolean term. This term may contain 
any operation symbols of the predefined signature £ and, in addition, a special 
binary predicate edge on the nodes of the left-hand side of the rule schema. 

Definition 7 (Conditional sule schema). Given a rule schema (L <— K — * 
R), extend the signature £ to £ L = {S L ,OP L ) by S L = S'UjNode}, OP^ Node = 
Vl, OP N L odellode Bool = {edge}, OP^ 8 = OP w<s if re G S* and s G S, and OP^ s = 
0 otherwise. Then a term c in Tq P l , Boo 1 (AT) is a condition and ((L <— K — > R),c) 
is a conditional rule schema. 

A conditional rule schema is also written as (L <— K — > R) where c. In 
pictures, a rule or rule schema {L 4— K — ■> R) is often given in the form L => R. 
In this case we assume that K consists of the numbered nodes of L and that 
these nodes are unlabelled in K. For example, Figure 4 shows a conditional rule 
schema that is applicable to a graph G only if x, y and z are instantiated such 
that a(x) + a(y) < a(z) and if there is no edge in G from the image of node 2 
to the image of node 1. 

Conditional rule schemata are instantiated by instantiating the rule schema 
according to some assignment a and by evaluating the condition by an extension 
of a which takes into account the meaning of the edge predicate. 
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12 12 



where x + y < z A -i edge(2, 1) 

Fig. 4. A conditional rule schema. 



Definition 8 (Instance of a conditional rule schema). Given a conditional 
rule schema r = ((L <— K — + f?),c), an assignment a. X — > A and a graph 
morphism g: L a — > G with G £ Q t { C A ), define the extension a g :T S L(X) — > A 
as follows: 



(1) a. g (x) = a(x) and a 9 (c) = c A for all variables x and all constants c in E. 3 



(2) n s (edge(u, w)) = 



tt if there is an edge in G from g{y) to g(w), 
ff otherwise. 

(3) a g (op(ti,...,t„)) = op A (a g (ti), . . . ,a g (t n )) 

for all op(ti, . . . , t n ) £ Tqpl {X) with op £ OP. 



Then the instance r“ of r is the conditional rule {(L a <— K a — » R a ), M ) where 
M = {g: L a — > G \ G £ Q t (C A ), g is a match and a g (c) = tt}. 



Given graphs G and H in Q t (C A ) and a conditional rule schema q = r where c, 
we write G => q H if there is an assignment a: X — > A and a graph morphism g 
such that G => r a , g H and a g (c) = tt. 

Operationally, the application of a conditional rule schema (L < — K —> 
R) where c to a graph G in C? 4 (Ca) amounts to the following steps: 



1. Find an injective premorphism g: L —> G satisfying the dangling condition. 

2. Find an assignment a:X — > A such that for all n in Dom(l/,), d(li,(n)) = 
h(g(n)). 

3. Check whether a g (c) = tt. 

4. Construct for (L a <— A'“ — > R a ) and g the natural pushouts of Definition 2 
(according to [9]). 



5 Deterministic Conditional Rule Schemata 

For an implementation of a programming language based on rule schemata it 
is prohibitive to enumerate all instances of a rule schema r = (L <— K — > R) 
in order to find an instance that turns a given premorphism g\ L —* G into a 
graph morphism. This is because r may have infinitely many instances. Even 
if one restricts attention to instances r“ where a evaluates the terms in L to 
labels of corresponding nodes and edges in G, there may be infinitely many 
instances left. For example, consider the conditional rule schema in Figure 5 and 
an associated premorphism g: L — > G. Whereas the values a(k) and a(z) are 
uniquely determined by g, there are infinitely many choices for a(x),a(y) and 

3 Note that a g is undefined for all constants in OP\ , lodB . 
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12 12 



where m < z 

Fig. 5. A conditional rule schema that is not deterministic. 

a(m) if nodes are labelled with integers, say. We therefore introduce a subclass of 
(conditional) rule schemata which are instantiated by premorphisms in at most 
one way. 

A term t in Ts(X) is simple if it is a variable or does not contain any 
variables. We denote by Var(f) and Var(G) the sets of variables occuring in a 
term t or graph G. 

Definition 9 (Deterministic conditional rule schema). A rule schema 
(L <— K — > R ) is deterministic , if 

(1) all labels in L are simple terms, and 

(2) Var (i?) C Var(L). 

A conditional rule schema (r, c) with r = (L <— — > i?) is deterministic if r is 

deterministic and Var(c) C Var(L). 

For example, the conditional rule schema in Figure 4 is deterministic. 

Proposition 1. Let r = (( L <— K — > i?),c) be a deterministic conditional rule 
schema and g: L — > G a premorphism with G £ G t {CA)- Then there is at most 
one instance r' of r such that g is a match for r' . 

Proof. Let r“ and r ^ be instances of r such that g is a match for both. By 
Definition 5 and Definition 8, we have r“ = r 33 if a(t) = $ (t) for all terms t 
in L and i?, and ct g (c) = /3 g (c ) (note that every term in K occurs also in L). 
Therefore it suffices to show that a{x) = /3(x) for each variable x in Var(L) U 
Var(f?) U Var(c). Since r is deterministic, we have x £ Var (L) . Hence there is 
a node or an edge in L that is labelled with a term containing x. Without loss 
of generality let v be a node such that x £ Var (Il,v(v))- Because all terms in 
L are simple, x = Il,v(v). Thus, by Definition 5, a(x) = a(x) = &(Il,v(v)) = 
lG,v(gv(v)) = P(l L ,v{v)) = p{x) = (3(x). □ 

Proposition 1 ensures that premorphisms cannot “instantiate” deterministic 
(conditional) rule schemata in more than one way. The next proposition gives 
a necessary and sufficient condition for such an instantiation to take place. The 
condition makes precise how to find an assignment a as required in the second 
step of the description of rule-schema application, given at the end of Section 4. 

Proposition 2. Let g: L — > G be a premorphism where L £ G(Ct) is labelled 
with simple terms and G £ G* (Ca)- Then there is an assignment a: X — > A such 
that g is a graph morphism from L a to G, if and only if for all nodes and edges 
n, n' in L, 
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(1) l G (g(n)) = t-A if is a variable-free term t, and 

(2) l G (g\n)) = l G (g(n')) if l L (n) = l L (n') G X. 

Proof. Suppose first that g is a graph morphism from L a to G. If n is labelled 
with a variable- free term t in L, then n’s label in L a is a(t) = tA- Since g is 
label-preserving, g(n) is labelled with tA, too. Moreover, if n and n! are labelled 
with the same variable x in L , then both are labelled with a(x ) in L a . Hence 
h(g(n)) = a(x) = l G (g(n')). 

Conversely, suppose that conditions (1) and (2) are satisfied. For every sort 
s in S, let cl s be a fixed element in A s . Then, by (2), 

, s _ j l G (g(n)) if there is a node or edge n with /^(n) = x, 

4 ' \ d s otherwise, where x G X s 

defines an assignment a:X — » A. Consider any node or edge n in L a . If Z/,(n) 
is variable-free, then (1) gives l G (g(n)) = tA = a(t) = lL a {n). Otherwise Zz,(n) 
is a variable x, and hence by definition of a, l G (g(n)) = a(x) = li,a(n). Thus 
g: L a — > G is label-preserving. □ 

6 Graph Programs 

We extend the language of [8, 7] by replacing rules with deterministic conditional 
rule schemata and adding a while-loop. 

Definition 10 (Syntax of programs). Programs are defined as follows: 

(1) For every finite set R of deterministic conditional rule schemata, R and 
are programs. 

(2) For every graph B in Q(Ct) and program P, while B do P end is a program. 

(3) If P and Q are programs, then P\Q is a program. 

A finite set of conditional rule schemata is called an elementary program. 
Our syntax is ambiguous because a program Pi\P 2 \Pz can be parsed as both 
(Pi; P2); P3 and Pi; (P2; P3). This is irrelevant however as the semantics of se- 
quential composition will be relation composition which is associative. 

Next we define a relational semantics for programs. Given a binary relation 
(f> C A x B between two sets A and B, the domain of is the set Dom(^) = 
{a G A | acj)b for some b £ B}. If A = B we write <j > * for the reflexive-transitive 
closure of <f>. The composition of two relations (j) and g on A is the relation 
<t> o g = {{a,c) \ acfb and bgc for some b}. Given a graph B in Q(Ct), let P? = 
{(P <— P — > P)} with P — > P being the identity morphism on P. 

Definition 11 (Semantics of programs). The semantics of a program P is 
a binary relation [P] on t/*(CA) 4 which is inductively defined as follows: 

4 Strictly speaking, the graphs in Q 1 {Ca) should be considered as abstract graphs , 
that is, as isomorphism classes of graphs. For simplicity we stick to ordinary graphs 
and consider them as representatives for isomorphism classes; see [8, 7] for a precise 
account. 
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(1) For every elementary program R, [P] = =>R- 

(2) [i?|] = {(G,H) | G H and H g Dom(=> fi )}. 

(3) [while B do P end] = {(G, H) G [B?; P\*\H £ Dom( [B?] )}. 

(4) [P;Q] = [P] o [Q] . 

By clause (3), the operational interpretation of while B do P end is that P is 
executed as long as B occurs as a subgraph. In particular, the loop has no effect 
on a graph G not containing B : in this case we have G [while B do P end] H 
if and only if G = H. Note also that if G contains B but P fails on input 
G either because a set of rules in P is not applicable or because P does not 
terminate, then the whole loop fails in the sense that there is no graph H such 
that G [while B do P endl H. 

Consider now subsets Q\ and Q-i of G t (CA) and a relation (j> Q Gi x Q 2 . We 
say that a program P computes <j> if <f) = [P] ft (Q\ x Q 2 ), that is, if <f> coincides 
with the semantics of P restricted to Q\ and t/ 2 - This includes the case of partial 
functions tj>: Q\ —> G 2 , which are just special relations. 



7 Dijkstra’s Shortest Path Algorithm 

The so-called single-source shortest path algorithm by Dijkstra [1, 11] computes 
the distances between a given start node and all other nodes in a graph whose 
edges are labelled with nonnegative numbers. Given a graph G and nodes v and 
w, a path from v to w is a sequence e \, . . . , e n of edges such that sc(ei) = v, 
ta{e n ) = w and tcifii) = sg(g+ 1 ) for i = 1, . . . ,n — 1. The distance of such a 
path is the sum of its edge labels. A shortest path between two nodes is a path 
of minimal distance. 

Dijkstra’s algorithm stores the distance from the start node to a node v in 
a variable d(v). Initially, the start node gets the value 0 and every other node 
gets the value 00 . Nodes for which the shortest distance has been computed are 
added to a set S, which is empty in the beginning. In each step of the algorithm, 
first a node w from Vq — S is added to S, where d(w) is minimal. Then for each 
edge e outgoing from w, d(tc{e)) is changed to min(d(tG(e)), d(w) + Ig,e(s)). 

7.1 A Simple Graph Program for Dijkstra’s Algorithm 

Before giving our graph programs, we specify the signature S and the algebra 
A of Assumption 1. The programs will store calculated distances as node labels, 
so we need some numerical type for both edge and node labels. Let Real be a 
sort in E , sy = Se = Real, and let be the set of nonnegative real numbers. 
We assume the following operation symbols in E: 5 OP\, R ea i = R + U { 00 , *, □}, 
OP Re aiReal.Bool = {<} and OPReaiReai.Reai = {+}■ The algebra A is given by 
A Rea i = R + U { 00 , *, □}, ca = c for all c G GP^, Real , x <a U = tt if and only if 

5 Note that all numbers in R + are used as constant symbols. The representation of 
numbers in an implementation of our programming language is beyond the scope of 
this paper. 
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(x, y £ R + and x < y) or (x ^ oo and y = oo), x +a y = x + y if x,y£ K + and 
x +a y = oo otherwise. 

Our first program for Dijkstra’s algorithm, Simple_Di jkstra, is given in 
Figure 6. We assume that the program is started from a graph in Q 1 (Ca) whose 
edges are labelled with nonnegative numbers and whose start node is marked by 
a unique loop labelled with *. The rule schema S_Prepare relabels every node of 
the input graph with oo, S_Start deletes the unique loop and relabels the start 
node with 0, and S_Reduce changes a stored distance whenever a shorter path 
has been found. 



Simple_Di jkstra = S_Prepare [; S_Start; S_Reduce [ 



S_Prepare : 




where x < oo 




S_Start : 





S_Reduce : 



©-0D ^ 



where (x + y) < z 



Fig. 6. The program Simple_Di jkstra. 



Proposition 3 (Correctness of Simple_Dijkstra). Let G be a graph 
in Q 1 (Ca) containing a unique loop e, where Ig,e{c) = * and, Ig,e{c') £ R + 
for all other edges e! . When started from G, Simple_Di jkstra terminates and 
produces a unique graph H which is obtained from G by removing e and labelling 
each node v with the shortest distance from sg(c) to v. 

Proof. Termination of SimpleJDi jkstra follows from the fact that every appli- 
cation of S_Prepare reduces the number of nodes not labelled with oo, and that 
every application of S_Reduce reduces the sum of all node labels in a graph. 

Let now H be a graph such that G [Simple_Di jkstra] H. Since there are no 
rule schemata for adding or deleting nodes, and S_Start is the only rule schema 
that alters G’s edges, it is clear that H can be obtained from G by removing the 
loop e and relabelling the nodes. Thus, H is uniquely determined if each node 
v is labelled with the shortest distance from sc(e) to v. To show the latter, we 
need the following invariance property. 
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Claim. Let G [S_Preparej; S.Start] Hq =>s_Reduce H' . Then for each node v in 
H\ either Ih\v(v) = oo or Ih',v{v) is the distance of a path from sg(c) to v. 

Proof. The proposition holds for H 0 , because sc(e) is labelled with 0 and every 
other node is labelled with oo. Moreover, it is easy to see that every application 
of S_Reduce preserves the claimed property. □ 

Suppose now that there is a node v in H such that Ih,v(v) is not the shortest 
distance from sc(e) to v. We distinguish two cases. 

Case 1-. v = sc(e). Since v is labelled with 0 after the application of S -Start, 
and Ih.v{v) 7^ 0, there must be an application of S_Reduce that changes v's 
label to a negative number. But this contradicts the above claim. 

Case & SG(e). By the above claim, there is a path from sc(e) to v (as 
otherwise Ih,v(v) 7^ 00). Let ei, . . . ,e n be a shortest path from sc(e) to v. Let 
vo = sc(e) and Vi = tn(ei) for i = 1 By Case 1, Ih,v(v 0) = 0. Hence, 

there is some k, 1 < k < n, such that Ih,v(vic ) is not the shortest distance from vq 
to v k and for i = 0, . . . , k— 1, lH,v{vi) is the shortest distance from vq to Vi. Now 
since ei, . . . , e n is a shortest path to v n it follows that ei, . . . , e* is a shortest path 
to Vk and that ei, . . . ,e k - 1 is a shortest path to Vk-i- So the shortest distance 
from v 0 to v k is YnZl 1 h,e(g ) + l H ,E{e k ) = + lH,E( e k)- As this sum 

is smaller than lH,v{vk ), S_Reduce is applicable to e k . But this contradicts the 
fact that H ^ Dom(=> SJleduce ). □ 

The correctness of Simple_Di jkstra was easy to show, however the program 
can be expensive in the number of applications of the rule schema S_Redu.ce. For 
example, the right-hand derivation sequence in Figure 7 contains 48 applications 
of S_Reduce and represents the worst-case program run for the given input graph 
of 5 nodes. In contrast, Dijkstra’s algorithms (as sketched at the beginning of 
this section) changes distances only 10 times when applied to the same graph. 
Although Simple_Di jkstra needs only 4 applications of S_Reduce in the best 
case, there is no guarantee that it does not choose the worst case. We therefore 
refine Simple_Di jkstra by modelling more closely the original algorithm. 

7.2 A Refined Program 

The program Di jkstra of Figure 8 uses a while -loop to repeatedly select a 
node of minimal distance and to update the distances of the target nodes of 
the outgoing edges of that node. Nodes that have not yet been selected are 
marked by a d-labelled loop. Removing the ^-labelled loop from a node by Next 
corresponds to adding that node to the set S of the original algorithm. Note 
that Dijkstra is essentially deterministic: Min J, always determines a node of 
minimal distance among all nodes marked with loops, and Reduce is applied 
only to edges outgoing from this node. 

The left-hand derivation sequence of Figure 7 is a worst-case run of Dijkstra, 
containing 26 rule-schema applications. Among these are only 10 applications of 
Reduce, which correspond to the 10 distance changes done by the original algo- 
rithm. The next proposition establishes the worst-case complexity of Dijkstra 
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5 



J| Dijkstra 



6 



jj Simple_Dijkstra 



32 




5 j| Dijkstra 



32 




42 j| Simple_Dijkstra 



32 




Dijkstra 




Fig. 7. Derivation sequences of Simple_Dijkstra and Dijkstra. 



in terms of the number of rule-schema applications, where we assume that input 
graphs satisfy the precondition of Proposition 3. 

Proposition 4 (Complexity of Dijkstra). When started from a graph con- 
taining n nodes and e edges, Dijkstra terminates after 0(n 2 + e) rule-schema 
applications. 
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Dijkstra = Prepare j; Start; while B do Min j; Reduce J.; Next end ; CleanUp 



B = 




□ 



Prepare : 



© 

1 

where -i edge(l,l) 




□ 



Start : 





* 




CleanUp : 





Fig. 8. The program Dijkstra. 
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Proof. The initialisation phase Prepare Start uses n rule-schema applica- 
tions. The body of the while -loop is executed (n — l)-times because initially 
there are n — 1 loops labelled with □. and each execution of the body reduces 
this number by one. So the overall number of Next-applications is n — 1, too. 
Each execution of Min J, takes at most n — 1 steps because there is only one *- 
labelled loop. Hence, there are at most (n — l) 2 applications of Min overall. The 
total number of Reduce-applications is at most e since Reduce cannot be applied 
twice to the same edge. This is because Reduce is applied only to edges outgoing 
from the ^-marked node, and the * mark is removed by Next. Thus, a bound 
for the overall number of rule-schema applications is n + (n — 1) + (n — l) 2 + e, 
which is in 0(n 2 -l-e). □ 

Note that if we forbid parallel edges in input graphs, then e is bounded by 
n 2 and hence the complexity of Dijkstra is 0(n 2 ). 



2 71-2 + 1 5 3 2 




1 111 



Fig. 9. A worst-case input for Simple_Dijkstra. 



The quadratic complexity of Dijkstra means a drastic improvement on the 
running time of Simple _Dijkstra which may be exponential. More precisely, 
one can show that for every n > 2 there is a graph with n nodes and 2 (n — 1) 
edges such that there is a run of Simple_Di jkstra in which the rule schema 
S_Reduce is applied Y^kZi 2 fc times. Such a graph is shown in Figure 9. (The 
running time of Dijkstra for this graph is actually linear.) 

8 Related Work 

A guiding principle in our ongoing design of the graph programming language 
GP is syntactic and semantic simplicity, which distinguishes GP from the com- 
plex PROGRES language [15]. It remains to be seen how much we have to 
compromise this principle to enable practical programming in application areas. 
Our approach also differs from a language such as AGG [4] in that we insist on 
a formal semantics. We want GP to be semantics-based since we consider the 
ability to formally reason on programs as a key feature. 

The rule schemata intoduced in this paper are not the only way to extend 
graph transformation with calculations on labels. An alternative is to use one of 
the approaches to attributed graph transformation that have been proposed in 
the literature. The recent papers [10, 3], for example, merge graphs and algebras 
so that attributed graphs are usually infinite. We rather prefer to work with 
finite graphs in which “attributes” are ordinary labels. 

Our method of working with rule schemata and their instances is close to 
Schied’s approach to double-pushout transformations on graphs labelled with 
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algebra elements [14]. (A single-pushout version of this approach is outlined 
in [13].) Roughly, his double-pushout diagrams can be decomposed into our 
diagrams with rule schema instantiations on top of them. A major difference 
between the present paper and [14] is that our rules can relabel and merge items 
whereas the rules in [14] are label preserving and injective. Schied also introduces 
conditions for rules, in the form of propositional formulas over term equations, 
but he does not consider built-in predicates on the graph structure such as our 
edge predicate. 

Surprisingly, there seems to be hardly any work on studying graph algorithms 
in the framework of graph transformation languages. We are only aware of a 
case study on Floyd’s all-pairs shortest path algorithm in Kreowski and Kuske’s 
paper [12]. The paper presents a program for Floyd’s algorithm and proves its 
correctness as well as a cubic bound for the number of rule applications. (The 
program consists of rules with parameters, similar to our rule schemata, but [12] 
does not give a general formalism for such rules.) 

9 Conclusion 

As pointed out in the Introduction, this paper is only the first step in extending 
the language of [7] to a graph programming language GP. We have introduced 
graph programs over rule schemata to incorporate numerical data and other basic 
data types. Rule schemata can have Boolean application conditions which may 
contain built-in predicates on the graph structure. We have identified determin- 
istic conditional rule schemata as a class of schemata that admit a reasonable 
implementation in that their applicability and the graphs resulting from ap- 
plications are uniquely determined by premorphisms from left-hand sides into 
graphs. As a case study for extended graph programs, we have given two pro- 
grams for Dijkstra’s shortest path algorithm and have analysed their correctness 
and complexity. 

In future work, more case studies on graph algorithms and in other areas 
will be pursued to find out what additional programming constructs are needed 
to make GP a practical language. We hope that new constructs can be mapped 
to a small core of GP - possibly the language used in this paper - to keep 
the semantics comprehensible and to facilitate formal reasoning on programs, 
static program analysis, program transformation, etc. And, of course, GP should 
eventually be implemented so that its practical usefulness can be proved. 
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Abstract. Adhesive high-level replacement (HLR) categories and sys- 
tems are introduced as a new categorical framework for graph transfor- 
mation in a broad sense, which combines the well-known concept of HLR 
systems with the new concept of adhesive categories introduced by Lack 
and Sobocinski. 

In this paper we show that most of the HLR properties, which had been 
introduced ad hoc to generalize some basic results from the category 
of graphs to high-level structures, are valid already in adhesive HLR 
categories. As a main new result in a categorical framework we show the 
Critical Pair Lemma for local confluence of transformations. Moreover we 
present a new version of embeddings and extensions for transformations 
in our framework of adhesive HLR systems. 



1 Introduction 

High-level replacement systems have been introduced in [1] to generalize the 
well-known double pushout approach from graphs [2] to various kinds of high- 
level structures, including also algebraic specifications and Petri nets. In order 
to generalize basic results, like the local Church- Rosser, parallelism and concur- 
rency theorem, several different conditions have been introduced in [1], called 
HLR conditions. The theory of HLR systems has been applied to a large number 
of example categories, where all the HLR conditions have been verified explicitly. 
Unfortunately, however, these conditions have some kind of ad hoc character, 
because they are just a collection of all the properties which are used in the 
categorical proofs of the basic results. Up to now it has not been analyzed how 
far these HLR properties are independent from each other or are consequences 
of a more general principle. 

This problem concerning the ad hoc character of the HLR conditions has 
been solved recently by Lack and Sobocinski in [3] by introducing the notion 
of adhesive categories. They have shown that the concept of “van Kampen 
squares”, short VK squares, known from topology [4], can be considered as such 
a general principle. Roughly spoken a VK square is a pushout square which is 
stable under pullbacks. The key idea of adhesive categories is the requirement 
that puslrouts along monomorphisms are VK squares. This property is valid not 
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only in the categories Sets and Graph, but also in several varieties of graphs, 
which have been used in the theory of graph grammars and graph transformation 
[5] up to now. On the other hand Lack and Sobocinski were able to show in [3] 
that most of the ad hoc HLR conditions required in [1] can be shown for adhesive 
categories. Together with the results in [1] this implies that the basic results 
for the theory of graph transformation mentioned above are valid in adhesive 
categories, where only the Parallelism Theorem requires in addition the existence 
of binary coproducts. 

Unfortunately the concept of adhesive categories incorporates an important 
restriction, which rules out several interesting application categories. The HLR 
framework in [1] is based on a distinguished class M of morplrisms, which is re- 
stricted to the class of all monomorphisms in adhesive categories. This restriction 
rules out the category (SPEC, M) of all algebraic specifications with class M of 
all strict injective specification morplrisms (see [1]) and several other integrated 
specification techniques like algebraic high-level nets [6, 7] and different kinds of 
attributed graphs [8, 9], which are important in the area of graph transformation 
and HLR systems. 

In this paper we combine the advantages of HLR and of adhesive categories 
by introducing the new concept of “adhesive HLR categories” . Roughly spoken 
an adhesive HLR category is an adhesive category with a suitable subclass M of 
monomorphisms, which is closed under puslrouts and pullbacks. As main results 
of this paper we are able to show that adhesive HLR categories are closed under 
product, slice, coslice and functor category constructions and that most of the 
important HLR properties of [1] are valid. These results are generalizations of 
corresponding results in [3], where we remove the restrictions, that M is the 
class of all monomorphisms and that adhesive categories in [3] are required to 
have all pullbacks instead of pullbacks along M-morphisms only. 

In sections 2 - 4 of this paper we review and recover the basic results for 
HLR systems in [1] and adhesive grammars in [3] in the framework of adhesive 
HLR categories and systems. Moreover, we present in section 5 a new version of 
the results for embedding and extension of transformations [2, 10, 11]. This is the 
basis to show in section 6 another main result of this paper: For the first time 
we present a categorical version of the Critical Pair Lemma for local confluence 
of transformations, discussed for lrypergraplrs in [12] and attributed graphs in 
[9], in our new framework of adhesive HLR systems. 

For lack of space we only give proof ideas for some of our results in this 
paper. For a more detailed version we refer to our technical report [13]. 

2 Review of Van Kampen Squares 
and Adhesive Categories 

In this section we review adhesive categories as introduced by Lack and So- 
bicinski in [3]. 

The basic notion of adhesive categories is that of a so called van Kampen 
square. The intuitive idea of a van Kampen square is that of a puslrout which 
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is stable under pullbacks and vice versa pushout preservation implies pullback 
stability. The name van Kampen derives from the relationship between these 
squares and the Van Kampen Theorem in topology [4]. 

Definition 1 (van Kampen square). A pushout (1) is a van Kampen (VK) 
square, if for any commutative cube (2) with (1) in the bottom and back faces 
being pullbacks holds: the top is pushout 4=> the front faces are pullbacks. 




In the definition of adhesive categories only those VK squares are considered, 
where m is a monomorphism. In this case the square is called a pushout along a 
monomorphism. The first interesting property of VK squares in [3] shows that 
in this case also n is a monomorphism and the square is also a pullback. 

Definition 2 (adhesive category). A category C is an adhesive category, if 

1. C has pushouts along monomorphisms, i.e. pushouts, where at least one of 
the given morphisms is monomorphism, 

2. C has pullbacks, 

3. pushouts along monomorphisms are VK squares. 

The most basic example of an adhesive category is the category Sets of sets. 
Moreover it is shown in [3] that adhesive categories are closed under product, 
slice, coslice and functor category construction. This implies immediately that 

S,t 

also the category Graphs of graphs G = (E =3 V), and also several variants like 
typed graphs, labelled graphs and hypergraphs are adhesive categories. This is 
a first indication that adhesive categories are suitable for graph transformation. 
Counterexamples for adhesive categories are Pos (partially ordered sets), Top 
(topological spaces), Gpd (groupoids) and Cat (categories), where pushouts 
along monomorphisms fail to be VK squares (see [3]). 

The main reason why adhesive categories are important for the theory of 
graph transformation and its generalization to high-level replacement systems 
(see [1]) is the fact that most of the HLR conditions required in [1] are shown to 
be valid already in adhesive categories (see [3]). This implies that basic results 
like the Local Church-Rosser Theorem and the Concurrency Theorem (see [1]) 
are valid already in the framework of adhesive categories, while the Parallelism 
Theorem needs in addition the existence of binary coproducts. 

The main advantage of adhesive categories compared with HLR categories 
in [1] is the fact that the requirements for adhesive categories are much more 
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smooth than the variety of different HLR conditions in [1], which have been 
stated “ad hoc” as needed in the categorical proofs of the corresponding results 
mentioned above. 

On the other hand HLR categories in [1] are based on a class M of morphisms, 
which is restricted to the class of all monomorphisms in adhesive categories. This 
rules out several interesting examples. In order to avoid this problem we combine 
the two concepts leading to the notion of adhesive HLR categories in the next 
section. 

3 Adhesive HLR Categories 

As motivated in the previous section we will combine the concepts of adhesive 
categories [3] and HLR categories [1] leading to the new concept of adhesive 
HLR categories in this section. Most of the results presented in this section 
are generalizations of results for adhesive categories in [3], but we present new 
interesting examples which are not instantiations of adhesive categories. 

The main difference of adhesive HLR categories compared with adhesive 
categories is the fact that we consider a suitable subclass M of monomorphisms 
instead of the class of all monomorphisms. Moreover we require only pullbacks 
along M-morphisms and not for general morphisms. 

Definition 3 (adhesive HLR category (C, M)). A category C with a mor- 
phism class M is called adhesive HLR category, if 

1. M is a class of monomorphisms closed under isomorphisms and closed under 
composition (f : A — > B £ M, g : B — > C € M => g ° / € M) and decompo- 
sition (g o f £ M, g e M => / g M), 

2. C has pushouts and pullbacks along M-morphisms and M-morphisms are 
closed under pushouts and pullbacks, 

3. pushouts in C along M-morphisms are VK squares. 

Remark 1. Most of the results in this paper can also be formulated under slightly 
weaker assumptions, where the existence of pullbacks is required only if both 
given morphisms are in M and pushouts along M-morphisms are required to be 
M-VK squares only, i.e. only for the case / € M or a, b, d € M. This weaker 
version is called “weak adhesive HLR category”. But presently we have no in- 
teresting example of this weak case that is not also an adhesive HLR category. 

Example 1. 1. All examples of adhesive categories are adhesive HLR categories 

for the class M of all monomorphisms. As shown in [3] this includes the 
category Sets of sets, Graphs of graphs and several variants of graphs like 
typed, labelled and lrypergraphs discussed above. Moreover this includes the 
category PT-Net of place transition nets considered in [1], 

2. The category (Spec, Mi) of algebraic specifications with class Mi of all 
monomorphisms is not adhesive, because pushouts along monomorphisms 
are not necessarily pullbacks. But (Spec, M 2 ) with class M 2 of all strict 
injective specification morphisms is an HLR2 category in the sense of [1] 
and also an adhesive HLR category. 
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For similar reasons the category AHL-Net of algebraic high-level nets (see 
[7]) has to be considered with strict injective specification morphisms con- 
cerning the specification part of the net morphism. 

3. An important new example is the category (AGraphsATG, M) of typed 
attributed graphs with type graph ATG and class M of all injective mor- 
phisms with isomorphisms on the data part. In our paper [14] we explicitly 
show that this is an adhesive HLR category satisfying all additional HLR 
properties considered later in this paper. 

The first important result shows that adhesive HLR categories are closed 
under product, slice, coslice and functor category construction. This allows to 
construct new examples from given ones. 

Theorem 1 (construction of adhesive HLR categories). Adhesive HLR 
categories can be constructed as follows: 

— If (C , Mi) and (D, Mi) are adhesive HLR, then (C X D, Mi x M 2 ) is 
adhesive HLR. 

— If (C , M) is adhesive HLR, then so are the slice category (C\C, MnC\C) 
and the coslice category (C\C , M fl C\C ) for any object C in C. 

— If (C , M ) is adhesive HLR, then every functor category ([X, C], M -functor 
transformations) is adhesive HLR. 

Remark 2. An M-functor transformation is a natural transformation t ' : F — > G 
where all morphisms t(X) : F(X) — + G(X) are in M. 

Proof idea. In the case of product and functor categories the properties of adhe- 
sive HLR categories can be shown componentwise. For slice and coslice categories 
some standard constructions for puslrouts and pullbacks can be used to show the 
properties. □ 

The second important result shows that most of the HLR conditions stated 
in [1,7] are already valid in adhesive HLR categories. 

Theorem 2 (HLR properties of adhesive HLR categories). Given an 
adhesive HLR category (C, M), the following HLR conditions are satisfied. 

1. Pushouts along M -morphisms are pullbacks. 

2. Pushout-pullback decomposition: Given the following diagram with l,w £ M, 
(1) + (2) pushout. and (2) pullback. Then (1) and (2) are pushouts and also 
pullbacks. 

3. Cube pushout-pullback property: Given the following commutative cube (3), 
where all morphisms in top and bottom are in M, the top is pullback and the 
front faces are pushouts. Then we have: the bottom is pullback <t=> the back 
faces of the cube are pushouts. 
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A - B - E 



k 


r 


( 1 ) 


S (2) 


V \ 

Li 


1 V 

w 



u w 

C s- D s- F 




f. Uniqueness of pushout complements for M-morphisms: Given 

k : A — * B £ M and s : B — > D then there is up to isomorphism at 
most one C with l : A — > C and u : C — > D such that diagram (1) is a 
pushout. 

Proof idea. These properties are shown for adhesive categories with class M of 
all monomorphisms in [3]. The proofs can be reformulated for any subclass M 
of monomorphisms as required for adhesive HLR categories. □ 

Remark 3. The HLR conditions stated above together with the existence of bi- 
nary coproducts compatible with M (see Thm. 3) correspond roughly to the 
HLR conditions in [7] resp. HLR2 and two of the HLR2* conditions in [1], The 
HLR2 condition of [1] stating that M is closed under isomorphisms is not needed 
in our context. 



4 Adhesive HLR Systems 

In this section we use the concept of adhesive HLR categories introduced in the 
previous sections to present the basic notions and results of adhesive HLR sys- 
tems in analogy to HLR systems in [1] . The Local Church- Rosser Theorem and 
the Parallelism Theorem are shown to be valid in [1] for HLR1 categories, and 
the Concurrency Theorem for HLR2 categories, where the existence of binary 
coproducts is only needed for the Parallelism Theorem. Using the properties of 
adhesive HLR categories in the previous section we can immediatly conclude 
that the Local Church-Rosser Theorem and the Concurrency Theorem are valid 
in adhesive HLR categories and the Parallelism Theorem in adhesive HLR cat- 
egories with binary coproducts. 

Definition 4 (adhesive HLR system). An adhesive HLR system 
AS = (C, M, S, P) consists of an adhesive HLR category (C , M ), a start object 

5 and a set of productions P, where 

l V 

1. a production p = L <— K — > R consists of objects L, K and R called 
left-hand side, gluing object and right-hand side respectively, and morphisms 
l : K — > L, r : K — > R with l, r € M , 

2. a direct transformation G => H via a production p and a morphism 
m : L — > G, called match, is given by the following diagram, called DPO- 
diagram, where (1) and (2) are pushouts, 
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( 1 > 

' ^ ' 


k (2) 
l „ 1 



u r " 

/ .? 

G ^ D H 



3. a transformation is a sequence Go => G\ =>■ ... =>■ G n of direct transforma- 
tions, written Go =>■ G n , 

f. the language L{AS) consists of all objects G in C derivable from the start 
object S by a transformation, i.e. L(AS ) = {G \ S =>■ G}. 

Remark j. 1. An adhesive HLR system is on the one hand an HLR system in 
the sense of [1], where in [1] we have in addition a distinguished class T of 
terminal objects, and on the other hand an adhesive grammar in the sense 
of [3], provided that the class M is the class of all monomorphisms. 

2. A direct transformation G =>■ H is uniquely determined up to isomorphism 
by the production p and the match m, because due to Thm. 2 It. 4 pushout 
complements along M-morphisms in adhesive HLR categories are unique up 
to isomorphism. 

3. All the examples for HLR1 and HLR2 systems considered in [1] and all sys- 
tems over adhesive HLR categories considered in Ex. 1 are adhesive HLR 
systems, which includes especially the classical graph transformation ap- 
proach in [2] . 

The following basic results are shown for HLR2 categories in [1] and they are 
rephrased for adhesive categories in [3] . According to Thm. 2 they are also valid 
for adhesive HLR systems. A more detailed version of these results is presented 
in [13], 

Theorem 3 (Local Church-Rosser, Parallelism and Concurrency The- 
orem). The Local Church-Rosser Theorems I and II, the Parallelism Theorem 
and the Concurrency Theorem as stated in [1] are valid for all adhesive HLR 
systems AS = ( C , M, S, P ). Only for the Parallelism Theorem we have to re- 
quire in addition that (C , M) has binary coproducts which are compatible with 
M , i.e. mi, m 2 £ M implies mi + m 2 £ M . 

Proof. Follows from [1] and Thm. 2. □ 

5 Embedding and Extension 

of Adhesive HLR Transformations 

In this section we present a categorical version of the Embedding Theorem for 
graph transformation (see [2]) using the concept of initial pushouts first intro- 
duced in [10]. The embedding theorem is not only important for the theory of 
graph transformation, but also for the component framework for system mod- 
elling introduced in [15]. In [11] it is shown how to verify the extension properties 
used in the generic component concept of [15] in the framework of HLR systems. 
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The Embedding Theorem and the Extension Theorem presented for adhesive 
HLR systems in this section combine the results for both areas and will also be 
used in the next section to show the Local Confluence Theorem. The key notion 
is the concept of initial pushouts, which formalizes the construction of boundary 
and context in [2]. The important new property going beyond [10] is the fact that 
initial pushouts are closed under double pushouts under certain conditions. As 
in Sec. 4 we assume also in this section that we have an adhesive HLR system. 

We start with the definition of an extension diagram in the sense of [15, 11] 
which means that a transformation t is extended to a transformation t' via an 
extension morphism. 

Definition 5 (extension diagram). An extension diagram is a diagram (1) 



t 

G 0 =**G n 




* 



Go ► G n ’ 

t’ 

where ko : Go — * > Gp is a morphism, called extension morphism, and t : Go => G n 
and tf : G' 0 =S> G' n are transformations via the same productions (po, ...,p n - 1 ) and 
matches resp. (ko o mo, ..., fc n -i o m n _i) defined by the following 

DPOs. 

pp U -6 Kj 3 * Rj 







V (i = 0 , .... n-1) 


t 


V 


Gi - 


A 


^ Gi+i 


*4 


1 




Gf - 


A’ 


^ Gi+i 



Remark 5. 1. The extension diagram (1) is completely determined (up to iso- 

morphism) by t : G 0 =>• G n and ko : Go — i • G' 0 (using the uniqueness of 
pushout complements) . 

2. Extension diagrams are closed under horizontal and vertical composition 
(using corresponding composition properties of pushouts). 

The main problem is now to determine under which condition a transforma- 
tion t : Go => G n and an extension morphism ko : Go — > Gg lead to an extension 
diagram. The key notion is that of an initial pushout, which will be required for 
the extension morphism ko in the consistency condition below. 

Definition 6 (initial pushout, boundary and context). Given f : A— » A’ 

a morphism b : B — > A with b £ M is called boundary over f if there is a pushout 
complement such that (1) is an initial pushout over f. Initiality of (1) over f 
means, that for every pushout (2) with b' £ M there exist unique morphisms 
b* : B — > D and c* : C — > E with b* , c* £ M such that b' o b* = b, d o c* = c and 
(3) is pushout. Then B is called boundary object and C context w.r.t. f : A — > A' . 
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Remark 6. In the classical case of graph transformations [2] the boundary B of 
a graph morphism / : A — > A' consists of all nodes in b £ A such that f(b ) 
is adjacent to an edge in A'\f(A). These nodes are necessary to glue A to the 
context graph C = A'\f(A) U f(b(B)) in order to obtain A ' as gluing of A and 
C via B in the initial pushout (1). 

As pointed out in the introduction of this section the closure of initial push- 
outs under double pushouts is an important technical lemma. 



Lemma 1 (closure property of initial pushouts). Let M' be a class ofmor- 
phisms closed under pushouts and pullbacks along M-morphisms. Moreover we 
assume to have initial pushouts over M' -morphisms. Then initial pushouts over 
M' -morphisms are closed under double pushouts. That means given an initial 
pushout (1) over ho £ M' and a double pushout diagram (2) with do, d\ £ M, 
then (3) and (4) are initial pushouts over d £ M 1 respectively h\ £ M’ for the 
unique b : B — > D with do o b = bo obtained by initiality of (1). 



B 



Go 



Go 



D 



b 0 | , 


, 1 do 


dj 


\ <i) \ ho 


h o\ ] 


< d \ 



C 



Go' 



Go’ 



D’ 



, h l (2) 

Gi’ 
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D 
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b 1 , 


d] o b 


v (3) r \ 


1 (4) h 



C 



Gi 
l h l 
Gi’ 



Proof idea. This can be shown shown stepwise for pushouts in the opposite and 
in the same direction by using the properties of M and M' . The complete proof 
can be found in [13]. □ 

The following consistency condition for a transformation t : Go =>• G n and 
an extension morphism ko '■ Go —> - G' 0 means intuitively that the boundary B of 
k 0 is preserved by t. In order to formulate this property we use the notion of a 
derived span der{t) = Go <— D — > G n of the transformation t , which connects 
the first and the last object. 

Definition 7 (derived span and consistency). The derived span of a direct 
transformation G ==>• H as shown in Def. 4 the span G <— D — > H . The 
derived span der(t) = ( Gq D ^3 G n ) of a transformation t : Gq =S> G n is the 
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composition via pullbacks of the spans of the corresponding direct transforma- 
tions. 

A morphism ko : G o — > Gq is called consistent w.r.t. a transformation t : Gq =>• 

G n with derived span der(t) = (Go & D ^ G n ) if there exist an initial pushout 
(1) over ko and a morphism b £ M with do ob = bo. 

b __ ^ 

B =*■ Go D =*■ G n 

I b 0 | , do d n 

I a) r° 

c - Go’ 



Remark 7. 1. The morphisms of the span G <— D — > H are in M because M 

is closed under pushouts. This implies that the compositions of these spans 
exist and are M-morphisms, because pullbacks along M-morphisms exist 
and M-morphisms are closed under pullbacks in adhesive HLR categories. 

2. The consistency condition in [2], called JOIN condition, requires a suitable 
family 6* : B — > Di of morphisms from the boundary B to the context 
graphs Di of the direct transformations. In fact, our consistency condition 
is equivalent to the existence of a corresponding family (6i)i=o,...,n-i- 

3. For the definition of consistency and for Thm. 4 below but not for Thm. 5 
- it would be sufficient to require the existence of a pushout over ko instead 
of an initial one. Moreover we need only conditions 1 and 2 of Def. 3. 



Now we are able to prove the Embedding and the Extension Theorem which 
show that consistency is sufficient and also necessary for the construction of 
extension diagrams. Moreover, we obtain a direct construction of the extension 
k n '■ G n — > G' n in the extension diagram. 

Theorem 4 (Embedding Theorem). Given a transformation t : Gq =)■ G n 
and a morphism ko : Go — > G' 0 which is consistent w.r.t. t, then there is an 
extension diagram for t and ko (see (1) in Def. 5). 



Proof idea (n = 2). We construct pullback (0) leading to the derived span Go <— 
Dq <— D — > D\ — > G 2 of the transformation t : Gq =>* G 2 . Given ko consistent 
w.r.t. t we have initial pushout (2) over ko and b : B — » D. 



B 



C 




Do 


- Gj 


Dj 


> &2 


V 


(lb) f k > (lc) 


V 


(Id) v* 2 


Do’ 


3»- G]’ -E 


Di 


G 2 ’ 



This leads to M-morphisms B — > Do and B ^ D 1 such that first D' 0 can be con- 
structed as pushout object of B — > D 0 and B — > G leading by decomposition to 
pushout (la) and by construction to pushout (lb). Then D[ can be constructed 
as pushout of B — > D\ and B — > G leading by decomposition to pushout (lc) 
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and by construction to pushout (Id). The given transformation t : Go =>■ G 2 
together with the pushouts (la) - (Id) constitutes the required extension dia- 
gram. □ 



Theorem 5 (Extension Theorem). Given a transformation t : Go =4- G n 
with derived span der(t) = (Go D ^ G n ) and an extension diagram (1) 



B 




V 

c 




Go 



k 0 



(1) 



with initial pushout (2) over kg £ M' for some class M' closed under pushouts 
and pullbacks along M -morphisms and initial pushouts over M' -morphisms, then 
we have 

1. ko consistent w.r.t. t : Go =>• G n with morphism b : B — > D, 

2. a direct transformation Gq => G' n via der(t) and match ko given by pushouts 
(3) and (4) with d, k n £ M' , 

3. initial pushouts (5) and (6) over d resp. k n . 



Gq D G n B D B G t 



do 


d n 




b 1 A 


O 


V (3) ' 


ld (4) \ 


< kn ' 


1 (5) \ d ' 


1 ( 6 > | 



Gq ’ < D ’ s* - G„ ’ C ^ D C G n ’ 



Remark 8. The extension theorem shows 

1. Consistency of ko w.r.t. t is necessary for the existence of the extension 
diagram. 

2. The extension diagram (1) can be represented by a direct transformation 
with match ko and comatch k n . 

3. The extension k n : G n — > G' n can be constructed by a pushout (6) of G n and 
context G along the boundary B with d n o b : B — > G n . 

Proof idea (n = 2). Given t and ko with initial pushout (2) and the extension 
diagram given by pushouts (la) - (Id) in proof of Thm. 4, where D is pullback 
in (0). Initially of (2) and pushout (la) lead to 6g : B — > D 0 and by Lem. 
1 to an initial pushout over k\. This new initiality and pushout (lc) leads to 
b\ : B — > D\ . The morphisms 6g and b\ lead to an induced b : B — » D - using the 
pullback properties of (0) - which allows to show consistency of ko w.r.t. t. This 
consistency immediately implies the pushout complement D' in (3) and pushout 
(4) of d and d n . Finally, the double pushout (3), (4) implies by Lem. 1 initial 
pushouts (5) and (6) from (2). □ 
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6 Critical Pairs and Local Confluence 
of Adhesive HLR Systems 

Critical pairs and local confluence have been studied for hypergraph transforma- 
tions in [12] and for typed attributed graph transformation in [9]. In this section 
we present a categorical version in adhesive HLR categories. As additional re- 
quirements we only need an E'-M' pair factorization for cospans of morphisms - 
in analogy to the well-known epi-mono-factorization of morphisms - and initial 
pushouts over M'-morphisms. These assumptions are stated where necessary. 
Otherwise we only assume to have an adhesive HLR system. 

It is well-known that local confluence and termination imply confluence. But 
we only analyze local confluence in this paper and no termination nor general 
confluence. 

Definition 8 (confluence, local confluence). A pair of transformations 
Hi l=G4 H2 is confluent if there are transformations Hi A X and H 2 A X . 
An adhesive HLR system is locally confluent, if this property holds for each pair 
of direct transformations, it is confluent, if it holds for all pairs of transforma- 
tions. 

In order to define and construct critical pairs we introduce the notion of 
E'-M' pair factorization. 

Definition 9 ( E'-M 1 pair factorization). An adhesive HLR category has E' - 
M' pair factorization, if M’ is a class of morphisms closed under pushouts 
and pullbacks along M -morphisms and E' a class of morphism pairs with same 
codomain and we have for each pair of morphisms fi : A\ — ► C, fa : A2 — > C 
that there is an object K and morphisms ei : Ai — > K , e2 : A2 — > K, m : K — > C 
with (ei, 62) £ E ' , m £ M' such that moei = /i and m o e2 = A- 




Remark 9. It is sufficient to require this property for matches fi = m, : Li — > G 
(i = 1, 2). The closure properties of M' are needed in Lem. 2 and Thm. 6. 

The intuitive idea of morphism pairs (ei, 62) £ E' in most example categories 
is that the pair is jointly surjective resp. jointly epimorphic. This can be achieved 
in categories C with binary coproducts and Eo-Mq factorization of morphisms, 

fl f2 

where Eq C Epis and Mq C Monos. Given Ai — * C A- A2 we simply take an 
E 0 -M 0 factorization / = m o e of the induced morphism / : Ai + A2 — > C and 
define ei = e o i 1 and e2 = e o i 2 , where i\, *2 are the coproduct injections. If 
the category has no binary coproducts, or the construction above is not always 
adequate - as in the case of typed attributed graph transformation - we may have 
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another alternative to obtain an E'-M' pair factorization. In [14] an explicit E'- 
M' pair factorization for typed attributed graph transformation is provided, 
where M'-morphisms are not necessarily injective on the data type part and 
hence M' <£. M . 

The main idea to prove local confluence is to show local confluence explicitely 
only for critical pairs based on the notion of parallel independence (see [1] ) . 

Definition 10 (critical pair). Given an E'-M' pair factorization, a critical 
pair is a pair of non-parallel independent direct transformations Pi p ^2l j ^ 

P2 such that (01,02) £ E' for the corresponding matches o\ and 02. 

The first step towards local confluence is to show completeness of critical 
pairs. 

Lemma 2 (completeness of critical pairs). Consider an adhesive HLR sys- 
tem with E'-M' pair factorization and M' C M . For each pair of non-parallel 
independent direct transformations H\ P 4= 1 Q U 2 there a critical pair 
P\ K 1^4? p 2 with extension diagrams (1) and (2) and m £ M' . 

\ (1) \m (2) j 



Remark 10. If M' ^ M we have to require in addition that the pushout-pullback 
decomposition (Thm. 2 It. 2) holds also with l £ M and w £ M' in diagram 
(l)+(2) of Thm. 2. 



Proof. With the E'-M' pair factorization for mi and m2 we get an object K and 
morphisms m : K — > G £ M ' , 01 : L\ — > K and 02 : L2 —> K with (01, 02) £ E’ 
such that mi = m o o\ and m2 = m o 02. We can now build the following 
extension diagram. First we construct the pullback over q\ and m, derive the 
morphism ti and by applying Thm. 2 It. 2 both squares are pushouts. In the 
case M' <f- M of Rem. 10 we have the pushout-pullback decomposition because 
1 1 £ M and m £ M' . With Def. 3 we can build the pushout over r\ and t\, derive 
the morphism z\ and with pushout decomposition this square is a pushout. The 
same construction is applied for the second transformation. 
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Pi <£= K p 2 are non-parallel independent. Otherwise there are morphisms 
* : L\ — > iV 2 and j : L 2 — > N± with z> 2 0 i = o\ and Vi o j = 02 . Then g 2 o s 2 0 i = 
mov 2 °i = mooi = mi and gi o si o j = m o v\ o j = m o 02 = m 2 , that means 
1 ^ G p =^ 2 ff 2 □ 

are parallel independent, contradiction. 

From [12] in the case of hypergraph transformation it is known already that 
confluence of critical pairs is not sufficient to show local confluence in general. 
In fact, we need a slightly stronger property, called strict confluence. 

Definition 11 (strict confluence of critical pairs). A critical pair P\ P A= 
K P AA£ p 2 is called strictly confluent, if we have 

1. confluence: the critical pair is confluent, i.e. there are transformations 
Pi =4> K' , P 2 =S> K' with derived spans der{Pi =4> K') = Pi Ni + 2 2 K' 
for i = 1,2. 

2. strictness: Let der(K Pi) = K Ni ^ Pi (i = 1,2) and N$, Nq and 
N pullback objects of pullbacks (1), (2) resp. (3) then there are morphisms 
z§ and zq such that (j), (5) and (6) commute. 




Remark 11. The strictness condition is a combination of corresponding condi- 
tions stated in [12] and [9]. More precisely, commutativity of (4) is required in 
[12] and that of (5) and (6) in [9]. In [12] however, commutativity of (5) and (6) 
seems to be a consequence of inclusion properties. The intuitive idea of strictness 
is that the common part N, which is preserved by each transformation of the 
critical pair, is also preserved by the transformations Pi => K' and P 2 =» K' 
and mapped by the same morphism N — > K' . 

Finally our last main result states that strict confluence of all critical pairs 
implies local confluence. This result is also known as Critical Pair Lemma (see 
[12,9]). 

Theorem 6 (Local Confluence Theorem - Critical Pair Lemma). An ad- 
hesive HLR system with E' -M' pair factorization, M' C M and initial pushouts 
over M 1 -morphisms is locally confluent, if all its critical pairs are strictly con- 
fluent. 

Remark 12. See Rem. 10 for the case M' M. In the proof we need that M is 
closed under decomposition (see Def. 3 It. 1) in order to show b 3 £ M. 
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Proof. Given a pair of direct transformations H\ Q P 2 Pff 2 we h ave to 

show the existence of transformations t\ : Hi =» G' and t ' 2 '■ H 2 =>■ G' . If the given 
pair is parallel independent this follows from the local Church-Rosser theorem. 
If the given pair is not parallel independent Lem. 2 implies the existence of a 
critical pair P\ J <^= 1 K P 2 with extension diagrams (7) and (8) and m £ M' . 
By assumption this critical pair is strictly confluent leading to transformations 
t\ : Pi K' , t 2 : P2 => K' and the diagram in Def. 11. 




Now let (9) be an initial pushout over m £ M' and consider the double pushouts 
(12), (13) and (14), (15) corresponding to extension diagrams (7) and (8) re- 
spectively. 



bl / 
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*V 
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' (13) ^4 (12) 91 
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B -JT N ;^r p i 

4 (18) *4 ( 12 ) qi 



V3 
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M; 



«/ 



> H 



Initiality of (9) applied to pushout (13) leads to unique b\, C\ £ M such that 
(16) and (17) commute and (18) is pushout. By Lem. 1 (18) is initial pushout 
over si. Dually we obtain b 2 , c 2 £ M with v 2 ob 2 = b. Using pullback property of 
(3) in Def. 11 we obtain a unique 63 : B — » N with Z\ o 63 = b\ and z 2 o 63 = b 2 . 
Moreover 61, Z\ £ M implies 63 £ M by decomposition property of M. In 
order to show consistency of q\ w.r.t. t\ we have to construct b ' 3 £ M such that 
(19) commutes where (18)+(12) is initial pushout over q\ by Lem. 1. In fact 
63 = W 5 o Z 5 o 63 £ M makes (19) commutative using (5) in Def. 11. 

Dually q 2 is consistent w.r.t. t 2 using b ' 4 = w& o zq o 63 £ M and (6) in Def. 
11. By the Embedding Theorem we obtain extension diagrams (10) and (11), 
where the morphism q : K’ — > G' is the same in both cases. This equality can be 
shown using part 3 of the Extension Theorem, where q is determined by an initial 
pushout of m r : B — > C and W 3 o b 3 : B — > K' in the first and w 4 o b 4 : B — > K' 
in the second case and we have W 3 o b ' 3 = w 4 o b 4 using commutativity of (4) in 
Def. 11. □ 
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7 Conclusion 

In this paper we have introduced adhesive HLR. categories and systems com- 
bining the framework of adhesive categories in [3] and of HLR systems in [1], 
We claim that this new framework is most important for different theories of 
graphs and graphical structures in computer science, which are mainly based on 
puslrout constructions. As shown in this paper this includes first of all the double 
pushout approach in the theory of graph transformation and HLR systems [2, 
1, 5], where important new results have been presented in this framework which 
are already applied to typed attributed graph transformation in [14]. Constraints 
and application conditions for DPO-transformations of adhesive HLR systems 
are considered already in [16]. On the other hand puslrouts have also been used 
in semantics in order to derive well-behaved labeled transition systems by Leifer 
and Milner in [17], by Sassone and Sobocinski in [18] and by Konig and Elrrig 
in [19]. We agree with [3] that the role of adhesive categories - and even more 
adhesive HLR categories - for this kind of applications is most likely to become 
comparable to the role of cartesian closed categories for simply typed lambda 
calculi as pointed out by Lambek and Scott in [20]. 
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Abstract. The concept of typed attributed graph transformation is 
most significant for modeling and meta modeling in software engineering 
and visual languages, but up to now there is no adequate theory for this 
important branch of graph transformation. In this paper we give a new 
formalization of typed attributed graphs, which allows node and edge at- 
tribution. The first main result shows that the corresponding category is 
isomorphic to the category of algebras over a specific kind of attributed 
graph structure signature. This allows to prove the second main result 
showing that the category of typed attributed graphs is an instance of 
“adhesive HLR categories”. This new concept combines adhesive cate- 
gories introduced by Lack and Sobocinski with the well-known approach 
of high-level replacement (HLR) systems using a new simplified version 
of HLR conditions. As a consequence we obtain a rigorous approach to 
typed attributed graph transformation providing as fundamental results 
the Local Church-Rosser, Parallelism, Concurrency, Embedding and Ex- 
tension Theorem and a Local Confluence Theorem known as Critical 
Pair Lemma in the literature. 



1 Introduction 

The algebraic theory of graph transformation based on labeled graphs and the 
double-pushout approach has already a long tradition (see [1]) with various ap- 
plications (see [2, 3]). Within the last decade graph transformation has been used 
as a modeling technique in software engineering and as a meta-language to spec- 
ify and implement visual modeling techniques like the UML. Especially for these 
applications it is important to use not only labeled graphs as considered in the 
classical approach [1], but also typed and attributed graphs. In fact, there are 
already several different concepts for typed and attributed graph transformation 
in the literature (see e.g. [4-6]). However, there is no adequate theory for this 
important branch of graph transformation up to now. The key idea in [5] is to 
model an attributed graph with node attribution as a pair AG = (G, A) of a 
graph G and a data type algebra A. In this paper we use this idea to model 
attributed graphs with node and edge attribution, where G is now a new kind 
of graph, called E-graplr, which allows also edges from edges to attribute nodes. 
This new kind of attributed graphs combined with the concept of typing leads 
to a category AGraphsATG of attributed graphs typed over an attributed type 
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graph ATG. This category seems to be an adequate formal model not only for 
various applications in software engineering and visual languages, but also for 
the internal representation of attributed graphs in our graph transformation tool 
AGG [7]. 

The main purpose of this paper is to provide the basic concepts and results 
of graph transformation known in the classical case [1] for this new kind of typed 
attributed graphs. The straightforward way would be to extend the classical the- 
ory in [1] step by step first to attributed graphs and then to typed attributed 
graphs. In this paper we propose the more elegant solution to obtain the theory 
of typed attributed graph transformation as an instantiation of the correspond- 
ing categorical theory developed in [8] . In [8] we have proposed the new concept 
of “adhesive HLR categories and systems” , which combines the concept of “ad- 
hesive categories” presented by Lack and Sobocinski in [9] with the concept of 
high-level replacement systems, short HLR systems, introduced in [10]. In [8] we 
have shown that not only the Local Church- Rosser, Parallelism and Concurrency 
Theorem - presented already in [10] for HLR systems -, but also several other 
results known from the classical theory [1, 9] are valid for adhesive HLR systems 
satisfying some additional HLR properties. 

For this purpose we have to show that the category AGraphsATG of typed 
attributed graphs is an adhesive HLR category in this sense. In Thm. 1 we 
show that the category AGraphsATG is isomorphic to a category of algebras 
over a suitable signature AGSIG(ATG), which is uniquely defined by the at- 
tributed type graph ATG. In fact, it is much easier to verify the categorical 
properties required for adhesive HLR categories for the category of algebras 
AGSIG(ATG)-Alg and to show the isomorphism between AGSIG(ATG)- 
Alg and AGraphsATG, than to show the categorical properties directly for 
the category AGraphsATG- In Thm. 2 we show that AGSIG(ATG)-Alg and 
hence also AGraphsATG is an adhesive HLR category. In fact, we show this 
result for the category AGSIG-Alg, where AGSIG is a more general kind of 
attributed graph structure signature in the sense of [4,11,12]. Combining the 
main results of this paper with those of [8] we are able to show that the fol- 
lowing basic results shown in Thm. 3-5 are valid for typed attributed graph 
transformation: 

1. Local Clrurch-Rosser, Parallelism and Concurrency Theorem, 

2. Embedding and Extension Theorem, 

3. Local Confluence Theorem (Critical Pair Lemma). 

Throughout the paper we use a running example from the area of model 
transformation to illustrate the main concepts and results. We selected a small 
set of model elements, basic for all kinds of object-oriented models. It describes 
the abstract syntax, i.e. the structure of method signatures. These structures 
are naturally represented by node and edge attributed graphs where node at- 
tributes store e.g. names, while edge attributes are useful to keep e.g. the order 
of parameters belonging to one method. Attributed graph transformation is used 
to specify simple refactorings on this model part such as adding a parameter, 
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exchanging two parameters, etc. Usually such refactorings are not always inde- 
pendent of each other. Within this paper we analyse the given refactoring rules 
concerning potential conflicts and report them as critical pairs. 

Node and edge attributed graphs build the basic structures in the graph 
transformation environment AGG [7]. The attribution is done by Java objects 
and expressions. We use AGG to implement our running example and to compute 
all its critical pairs. In GenGED [13], a visual environment for the definition 
of visual languages, the internal structures are AGS I G- algebras for attributed 
graph structure signatures AGSIG discussed above. 

This paper is organized as follows. In section 2 we introduce node and edge 
attributed graphs and typing and present our first main result. Typed attributed 
graphs in the framework of adhesive HLR categories are discussed in section 3 
together with our second main result. This allows to present the theory of typed 
attributed graph transformation in section 4 as an instance of the general theory 
in [8]. Finally we discuss related work and future perspectives in section 5. 

For lack of space we can only present short proof ideas in this paper and refer 
to our technical report [14] for more detail. 

2 Node and Edge Attributed Graphs and Typing 

In this section we present our new notion of node and edge attributed graphs, 
which generalizes the concept of node attributed graphs in [5], where node at- 
tributes are modelled by edges from graph nodes to data nodes. The new concept 
is based on graphs, called E-graphs, which allows also edges from graph edges to 
data nodes in order to model edge attributes. An attributed graph AG = ( G , D) 
consists of an E-graplr G and a data type D, where parts of the data of D are 
also vertices in G. This leads to the category AGraphs of attributed graphs and 
AGraphsATG of typed attributed graphs over an attributed type graph ATG. 
The main result in this section shows that AGraphs atg is isomorphic to a cat- 
egory AGSIG (ATG)- Alg of algebras over a suitable signature AGSIG(ATG), 
which is in one-to-one correspondence with ATG. 

In our notion of E-graphs we distinguish between two kinds of vertices, called 
graph and data vertices, and three different kinds of edges, according to the 
different roles they play for the representation and implementation of attributed 
graphs. 

Definition 1 (E-graph). An E-graph G = (Vi, V2, E\, E2, Es, (source^, 
targeti)i~ 1,2,3) consists of sets 

— V\ and V2 called graph resp. data nodes, 

— Ei, E2, E3 called graph, node attribute and edge attribute edges respectively, 
and source and target functions 

— sourcei : E\ —> V\, source2 : E2 Vi, sources '■ E3 — > E\, 

— targeti : E\ — > V) , target2 : E2 —> V2, targets : E3 — > V2 ■ 
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An E-graph morphism f : G\ — > Gi is a tuple (fvi, /v 2 , Ie^ /e 2 , Ie 3 ) with 
fvi ■ Gi,Vt > G2,Vi and /ej ■ Gi,^ — > G2,Ej for i = 1 , 2 , j = 1 , 2,3 such 
that f commutes with all source and target functions. 

E-graphs combined with E-graph morphisms form the category EGraphs. 



The following notions of attributed and typed attributed graphs are in the 
spirit of (node) attributed graphs of [5], where graphs are replaced by E-graphs in 
order to allow node and edge attribution. A data signature DSIG is an ordinary 
algebraic signature (see [15]). 



Definition 2 (attributed graph). Consider a data signature DSIG = 
( Sd,OPd ) with attribute value sorts S' D C Sd- An attributed graph AG = 
( G,D ) consists of an E-graph G together with a DSIG-algebra D such that 

l-besb D s Gv 2 • 

An attributed graph morphism is a pair f = (fa, Ia) with an E-graph morphism 
fa and an algebra homomorphism fA such that (1) is a pullback for all s £ S' D . 



D i,s 



n 

v 



fA,s 



Gi,v 2 



fa, 



( 1 ) 



V2 



D 2.s 
1 
v 

G2.V2 



Attributed graphs and attributed graph morphisms form the category AGraphs. 



Remark 1. The pullback property for the graph morphism is required for Thm. 
1 , otherwise the categories in this theorem would not be isomorphic. 

Definition 3 (typed attributed graph). An attributed type graph is an at- 
tributed graph ATG = ( TG , Z) where Z is the final DSIG-algebra. 

A typed attributed graph {AG A) over ATG consists of an attributed graph AG 
together with an attributed graph morphism t : AG — » ATG. A typed attributed 
graph morphism f : {AG\,t\) — > (AG2,t2) is an attributed graph morphism 
f : AG\ AG2 such that <2 0 / = ti- 

Typed attributed graphs over ATG and typed attributed graph morphisms form 
the category AGraphs atg- The class of all attributed type graphs ATG is de- 
noted by ATG-Graphs. 

Example 1 (typed attributed graphs). Given suitable signatures CHAR, 
STRING and NAT, the data signature DSIG is defined by 
DSIG = CHAR + STRING + NAT+ 
sorts: ParameterDirectionKind 

opns: in, out, inout, return: — » ParameterDirectionKind 

and the set of all data sorts used for attribution is S' D = {String, Nat, Parameter- 
DirectionKind}. Fig. 1 shows an attributed type graph ATG = ( TG,Z ) for 
method signatures, ft is an attributed graph where each data element is named 
after its corresponding sort, because the final DS'JG'-algebra Z has sorts Z s = 
{s} for all s € Sd- Note that TG is an E-graph with edge attribute edge “order” 
from “parameter” to “Nat”. An attributed graph AG typed over ATG is given 
in Fig. 2, where only those algebra elements are shown explicitly which are used 
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Fig. 1. 




( return } 

"v. 




Fig. 2. 

for attribution. The graph AG is typed over ATG by the attributed graph mor- 
phism t : AG — > ATG defined on vertices by t{ m) = Method, t(par!) = t(par 2 ) 
= f(par 3 ) = Parameter, t{ c) = Class, f(l) = t( 2) = t{ 3) = Nat, t(return) = f(in) 
= ParameterDirectionKincl and t(pl) = f(p2) = t(add) = f(Nat) = String. In 
AGG, a typed attributed graph like the one in Fig. 2 is depicted in a more com- 
pact notation like the graph in Fig. 3. Each node and edge inscription has two 
compartments. The upper compartment contains the type of a graph element, 
while the lower one holds its attributes. The attributes are ordered in a list, just 
for convenience. Nodes and edges are not explicitly named. While the formal 
concept of an attributed graph allows partial attribution in the sense that there 
is no edge from a graph node or edge to a data node, this is not possible in 
AGG. Thus, parameter par 3 has to be named by an empty string. Furthermore, 
the formal concept allows several outgoing attribute edges from one graph node 
or edge which is also not possible in AGG. □ 

The category AGraphsATG introduced above is the basis for our theory of 
typed attributed graph transformation in this paper. In order to prove properties 
for AGraphsATG, however, it is easier to represent AGraphsATG as a category 
AGSIG(ATG)-Alg of classical algebras (see [15]) over a suitable signature 
AGSIG(ATG). For this purpose we introduce the notion of general respectively 
well-structured attributed graph structure signatures AGSIG where the well- 
structured case corresponds to attributed graph signatures in the LKW-approach 
[4]. The signature AGSIG(ATG) becomes a special case of a well-structured 
AGSIG. 












166 Hartmut Ehrig, Ulrike Prange, and Gabriele Taentzer 




Fig. 3. 



Definition 4 (attributed graph structure signature). A graph structure 
signature GSIG = ( S G ,OP G ) is an algebraic signature with unary operations 
op : s — > s' in OP G only. An attributed graph structure signature AGSIG = 
(GSIG, DSIG) consists of a graph structure signature GSIG and a data sig- 
nature DSIG = ( Sd,OPd ) with attribute value sorts S’ D C Sd such that 
S'j-, — Sd D Sq and OPo D OP G = 0. 

AGSIG is called well-structured if for each op : s — > s' in OP G we have s Sd- 
The category of all AGSIG -algebras and AG S IG -homomorphisms ( see [15]) is 
denoted by AGSIG- Alg. 

Theorem 1 (Characterization of AGraphsATc)- For each attributed type 
graph ATG there is a well- structured attributed graph structure signature 
AGSIG(ATG) such that AGraphsATG is isomorphic to the category 
AGSIG (ATG)- Alg of AG SIG(ATG) -algebras: 

AGraphsATG = AGSIG (ATG)- Alg. 

Construction. Given ATG = ( TG,Z ) with final DSIG- algebra, Z we have 
TGv 2 = (JseS‘ D Z s = S' D and define AGSIG(ATG) = (GSIG = (S G ,OP G ), 
DSIG) with S G = Sy U Se and Sy = TGy 1 U TGy 2 , Se = TGe 1 U TGe 2 U 
TGe 3 and OP G =U e es E OP e with OP e = {src e ,tar e } defined by 

— src e : e — » v(e) for e G TGe 3 with v(e) = source : [ G (e) £ TGy 1 , 

— tar e : e — > v'(e) for e £ TGe 1 with v'(e) = targetf G (e) £ TGy 3 , 

— src e , tar e for e £ TGe 2 and e £ TGe 3 defined analogously. 



Proof idea. Based on the construction above we are able to construct a functor 
F : AGraphsATG — AGSIG(ATG)-Alg and a corresponding inverse functor 
F- 1 . □ 
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3 Typed Attributed Graphs 

in the Framework of Adhesive HLR Categories 



As pointed out in the introduction we are not going to develop the theory of 
typed attributed graph transformation directly. But we will show that it can 
be obtained as an instantiation of the theory of adhesive HLR systems, where 
this new concept (see [8]) is a combination of adhesive categories and grammars 
(see [9]) and HLR systems introduced in [10]. For this purpose we present in 
this section the general concept of adhesive HLR categories and we show that 
AGSIG-Alg, AGSIG(ATG)-Alg and especially the category AGraphsATG 
of typed attributed graphs are adhesive HLR categories for a suitable class M 
of morphisms. Moreover our categories satisfy some additional HLR conditions, 
which are required in the general theory of adhesive HLR systems (see [8]). This 
allows to apply the corresponding results to typed attributed graph transforma- 
tion systems, which will be done in the next section. 

We start with the new concept of adhesive HLR categories introduced in [8] 
in more detail. 



Definition 5 (adhesive HLR category). A category C with a morphism 
class M is called adhesive HLR category, if 



1. M is a class of monomorphisms closed under isomorphisms and closed under 
composition (f : A — > B £ M, g : B — ► C £ M 4 jo/e Af) and decompo- 
sition (go f g M, g e M => / g M), 

2. C has pushouts and pullbacks along M -morphisms, i.e. if one of the given 
morphisms is in M, then also the opposite one is in M , and M -morphisms 
are closed under pushouts and pullbacks, 

3. pushouts in C along M -morphisms are van Kampen (VK) squares, where a 
pushout (1) is called VK square, if for any commutative cube (2) with (1) 
in the bottom and pullbacks in the back faces we have: the top is pushout <f=> 
the front faces are pullbacks. 



m 


C 


f 


X 




X 


A 


(1) 


B 


X 




X 


8 


D 


n 




(2) 



Important examples of adhesive HLR categories are the category (Sets, 
Minj) of sets with class M ln j of all injective functions, the category (Graph, 
M ln j ) of graphs with class M m j of injective graph morphisms and different kinds 
of labelled and typed graphs (see [8] ) . Moreover all HLR1 and HLR2 categories 
presented in [10] are adhesive HLR categories. In the following we will show 
that also our categories AGSIG-Alg, AGSIG(ATG)-Alg and AGraphsATG 
presented in the previous section are adhesive HLR categories for the class M of 
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all injective morphisms with isomorphic data type part, which is used for typed 
attributed graph transformation systems in the next section. 

Definition 6 (class M for typed attributed graph transformation). The 

distinguished class M is defined by f £ M if 

1. fa is injective, fA is isomorphism for f = (fc,fA) in AGraphsATG and 
AG = (G,A), 

2. fnsiG is injective, fosiG is isomorphism for f in AGSIG-Alg or 

AGSIG (ATG)-Alg and AGSIG = ( GSIG,DSIG ). 

Remark 2. The corresponding categories (AGraphsATG, M), (AGSIG-Alg, 
M ) and (AGSIG(ATG)-Alg, M) are adhesive HLR categories (see Thm. 2). 
For simplicity we use the same notation M in all three cases. For practical 
applications we assume that fA and fosiG are identities. 

This class M of morphisms is on the one hand the adequate class to define 
productions of typed attributed graph transformation systems (see Def. 7), on 
the other hand it allows to construct pushouts along M-morphisms componen- 
twise. This is essential to verify the properties of adhesive HLR categories. 

Lemma 1 (properties of pushouts and pullbacks in (AGSIG-Alg, M)). 



1. Given m : C — > A and f : C — > B with to £ M then there is a pushout (1) 

in AGSIG-Alg with n £ M . 

m C f 
X X 
A (1) B 

X X 

8 D n 

Moreover given (1) commutative with m £ M then (1) is a pushout in 
AGSIG-Alg iff (1) is a componentwise pushout in Sets. If m £ M then 
also n £ M. 

2. Given g : A — > D and n : B — > D then there is a pullback (1) in AGSIG- 
Alg. Moreover given (1) commutative then (1) is a pullback in AGSIG-Alg 
iff (1) is a componentwise pullback in Sets. If n £ M then also m £ M. 

Proof, (see [14]) 

Remark 3. Since AGSIG(ATG) is a special case of AGSIG, the lemma is also 
true for AGSIG(ATG)-Alg. It is well-known that AGSIG-Alg - as a category 
of algebras - has pushouts even if m ^ M , but in general such pushouts cannot 
be constructed componentwise. 

Theorem 2 (adhesive HLR categories). The category (AGraphsATG: M) 

of typed attributed graphs and also (AGSIG-Alg, M) and (AGSIG(ATG)- 
Alg, M ) are adhesive HLR categories. 
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Proof. It is sufficient to prove the properties for (AGSIG-Alg, M), because 
(AGSIG(ATG)-Alg, M) is a special case of (AGSIG-Alg, M) and 
(AGraphsATG, M) = AGSIG(ATG)-Alg by Thm. 1. 

1. The class M given in Def. 6 is a subclass of monomorphisms, because 
monomorphisms in AGSIG-Alg are exactly the injective homomorphisms, 
and it is closed under composition and decomposition. 

2. (AGSIG-Alg, M) has pushouts and pullbacks along M-morphisms and 
M-morphisms are closed under pushouts and pullbacks due to Lem. 1. 

3. Pushouts along M-morphisms in AGSIG-Alg are VK squares because 
pushouts and pullbacks are constructed componentwise in Sets (see Lem. 
1) and for (Sets, Mi n j) pushouts along monomorphisms are VK squares as 
shown in [9] . 

4 Theory of Typed Attributed Graph Transformation 

After the preparations in the previous sections we are now ready to present the 
basic notions and results for typed attributed graph transformation systems. 
In fact, we obtain all the results presented for adhesive HLR systems in [8] in 
our case, because we are able to show the corresponding HLR conditions for 
(AGraphsATG, V/). This category (AGraphsATG. M) is fixed now for this 
section. 

Definition 7 (typed attributed graph transformation system). A typed 
attributed graph transformation system GTS = (DSIG, ATG, S, P) based on 
( AGraphsATG. M ) consists of a data type signature DSIG, an attributed type 
graph ATG, a typed attributed graph S, called start graph, and a set P of pro- 
ductions, where 

1. a production p = (L <— K —> R) consists of typed attributed graphs L, K 
and R attributed over the term algebra Tdsig(X) with variables X, called 
left hand side L, gluing object K and right hand side R respectively, and 
morphisms l,r € M, i.e. I and r are injective and isomorphisms on the data 
type T DS i G {X), 

2. a direct transformation G H via a production p and a morphism 

m : L — > G, called match, is given by the following diagram, called dou- 
ble pushout diagram, where (1) and (2) are pushouts in AGraphsATG. 



1 


i , 


r i 


m ii 


a) i 


' (2) 1 



G ' D ' H 



3. a typed attributed graph transformation, short, transformation, is a sequence 
Go Gi => ... =>■ G n of direct transformations, written Go => G n , 

4. the language L(GTS) is defined by L{GTS) = {G \ S A G}. 
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Remark f. A typed attributed graph transformation system is an adhesive HLR 
system in the sense of [8] based on the adhesive HLR category (AGraphsATG, 
M). 

Example 2 (typed attributed graph transformation system). In the following, we 
start to define our typed attributed graph transformation system 
M ethodM odelling by giving the productions. All graphs occuring are attributed 
by term algebra Tdsig(X) with DSIG being the data signature presented in Ex. 
1 and A — i.e. A = A String LI Aj n £ U Xp ararn ete r Di r ecti 0n Kind with 

Xstring — Pi PtyPG , R1 , P‘2 ) , A ParameterDirectionKind — and Xi n t = 

{n, x,y}. We present the productions in the form of AGG productions where 
we have the possibility to define a subset of variables of X as input parameters. 
That means a partial match is fixed by the user before the proper matching pro- 
cedure starts. Each production is given by its name followed by the left and the 
right-hand side as well as a partial mapping from left to right given by numbers. 
From this partial mapping the gluing graph K can be deduced being the domain 
of the mapping. Parameters are m, p, k, ptype, x and y. We use a graph notation 
similarly to Fig. 3. 



' addParameter of 



2 Method 



mn.im-=m 

noOiPars^n 



1 Class 



cname=ptype 
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AGG productions are restricted concerning the attribution of the left-hand 
sides. To avoid the computation of most general unifiers of two general terms, 
nodes and edges of left-hand sides are allowed to be attributed only by constants 
and variables. This restriction is not a real one, since attribute conditions may 
be used. A term in the left-hand side is equivalent to a new variable and a new 
attribute condition stating the equivalence of the term and this new variable. 
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Productions addMethod and addCdass with empty left-hand side and a sin- 
gle method respectively class on the right-hand side are not shown. Together 
with production addP arameter they are necessary to build up method signa- 
tures. A new parameter is inserted as last one in the parameter list. Production 
check New Parameter checks if a newly inserted parameter is already in the list. 
In this case it is removed. Production exchangeP arameter is useful for changing 
the order of parameters in the list. 

The start graph S is empty, i.e. 5 = 0, while data signature DSIG and type 
graph T have already been given in Ex. 1. Summarizing, the typed attributed 
graph transformation system is given by M ethodM odelling = (DSIG, ATG, 
S,P) with P = {addMethod, addClass,addParameter,exchangeParameter, 
checkNewParameter} . □ 



In the following we show that the basic results known in the classical theory 
of graph transformation in [1] and in the theory of HLR systems in [10] are also 
valid for typed attributed graph transformation systems. 

The Local Church- Rosser Theorem states that direct transformations G 
Hi and G p lXf^ H 2 can be extended by direct transformations Hi p A=f 2 X and 



H 2 X leading to the same X , provided that they are parallel indepen- 

dent. Parallel independence means that the matches m i and m 2 overlap only in 
common gluing items, i.e. mi(Li) nm 2 (L 2 ) C mi(li(Ki)) D m 2 (l 2 (K 2 )). 

The Parallelism Theorem states that in the case of parallel independence 
we can apply the parallel production pi + p 2 = (Li + L 2 l I~ 

Ri + R 2 ) in one step G X from G to X. Vice versa each such direct 

parallel derivation can be sequentialized in any order leading to two sequential 



• ii. n Pi,™i tt P2Xn 2 v „ P2,rn 2 „ Pl,m 1 

independent sequences G =>■ U 1 => X and G ==> H 2 => X. 




P2, m2 ~~^H 2 p 1, ill 1 

The case of general sequences, which may be sequentially dependend, is han- 



dled by the Concurrency Theorem. Roughly spoken, for each sequence G 



Pi, mi 



Hi P A=^ X there is a production pi * p 2 , called concurrent production, which 
allows to construct a direct transformation G p Aff£ X and vice versa, leading, 
however, only to one sequentialization. 



Theorem 3 (Local Church-Rosser, Parallelism and Concurrency The- 
orem). The Local Church-Rosser Theorems I and II, the Parallelism Theorem 
and the Concurrency Theorem as stated in [10] are valid for each graph trans- 
formation system based on (AGraphsATG, M). 



Proof idea. The Local Church-Rosser, Parallelism and Concurrency Theorem 
are verified for HLR2 categories in [10] and they are shown for adhesive HLR 
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systems in [8] , where only the Parallelism Theorem requires in addition the exis- 
tence of binary coproducts compatible with M. Compatibility with M means /, 
g £ M implies f + g £ M. In Thm. 2 we have shown that (AGraphsATG, Af) 
is an adhesive HLR category. Binary coproducts compatible with M can be con- 
structed already in AGSIG-Alg with well-structured AGSIG and transfered 
to AGraphsATG by Thm. 1. In (AGSIG-Alg, M) binary coproducts can be 
constructed separately for the DSIG- part and componentwise in Sets for all 
sorts s £ Sg\Sd , which implies compatibility with M. If AGSIG is not well- 
structured we still have binary coproducts in AGSIG-Alg - as in any category 
of algebras - but they may not be compatible with M. □ 

The next basic result in the classical theory of graph transformation systems 
is the Embedding Theorem (see [1]) in the framework of adhesive HLR systems. 
The main idea of the Embedding Theorem according to [1] is to show under which 
conditions a transformation t : Go => G n can be extended to a transformation 
t' : Gq G' n for a given “embedding” morphism ko : Go — * ► G' 0 . In the case 
of typed attributed graph transformation we consider the following class M’ of 
“graph part embeddings”: M’ consists of all morphisms ko where the E-graplr 
part of k 0 is injective except of data nodes (see Def. 1-3). This means, that the 
algebra part of ko is not restricted to be injective. 

Similar to the graph case it is also possible in the case of typed attributed 
graphs to construct a boundary B and context C leading to a puslrout (1) over 
k 0 in Fig. 4 with b 0 £ M, i.e. G' 0 is the gluing of go and context C along the 
boundary B. This boundary-context puslrout (1) over ko turns out to be an 
initial puslrout over ko in the sense of [8] . 

Now the morphism ko £ M ' is called consistent with respect to the trans- 
formation t : Go =>■ G n , if the boundary B is “preserved” by t leading to a 
morphism b n : B — > G n £ M in Fig. 4. (For a formal notion of consistency see 
[ 8 ]-) 

The following Embedding and Extension Theorem shows that consistency is 
necessary and sufficient in order to extend t : Go => G n for ko : Go — > G' 0 to 
t : G'o 4- G' n . 

Theorem 4 (Embedding and Extension Theorem). Let GTS be a typed 
attributed graph transformation system based on ( AGraphsATG , M ) and M' 
the class of all graph part, embeddings defined above. Given a transformation 
t : Go => G n and a morphism ko : Go — > G' 0 £ M' with boundary- context pushout 
(1) over ko we have: The transformation t can be extended to a transformation 
t' : Gq =4- G' n with morphism k n : G n — > G' n £ M' leading to diagram (2), called 
extension diagram, and a boundary-context pushout (3) over k n if and only if 
the morphism k 0 is consistent with respect to t. 

Proof idea. This theorem follows from the Embedding and Extension Theorems 
in [8] shown for adhesive HLR systems over an adhesive HLR category (C, M). 
It requires initial puslrouts over M'-morphisms for some class M' , which is closed 
under puslrouts and pullbacks along M-morphisms. By Thm. 2 we know that 
(AGraphsjYTG, M) is an adhesive HLR category. In addition it can be shown 
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that for each ko £ M' , where M' is the class of all graph part embeddings, there 
is a boundary-context pushout (1) over ko, which is already an initial pushout 
over k 0 in the sense of [8]. Moreover it can be shown that M' is closed under 
pushouts and pullbacks along Af-morphisms. □ 

The Embedding and Extension Theorems are used in [8] to show the Local 
Confluence Theorem, also known as critical pair lemma, in the framework of 
adhesive HLR systems, where in addition to initial pushouts also the existence 
of an E'-M ' pair factorization is used. 

Definition 8 (E'-M' pair factorization). Given a class E' of morphism pairs 
(ei,e2) with the same codomain and M 1 the class of all graph part embeddings 
defined above. We say that a typed attributed graph transformation system based 
on ( AGraphsATG. M ) has E'-M' pair factorization, if for each pair of matches 
f\ : Li — ■> G, $2 '■ L2 —■ > G there is a pair e\ : L\ — > K, e2 : L2 — > K with 
(e\,e2) € E' and a morphism m : K — > G with m £ M' such that mo ei = fi 
and mo e2 = f 2- 




Remark 5. For simplicity we have fixed M' to be the class of all graph part 
embeddings, which implies that M' is closed under pushouts and pullbacks along 
M-morphisms as required for E’-M' pair factorization in [8] with general class 
M'. 

Example 3. 1. Let E' be the class of jointly surjective morphisms in 

A Graphs atg with same codomain. Given /1 and we obtain an induced 
morphism /12 : L\ + L2 — > G with coproduct injections i\ : Li — > L\ + L2 
and *2 : L2 ~ ^ Li + ^2- Now let /12 = m o e an epi-mono factorization of 
fi2 leading to ei = eoj] and e2 = eoj 2 with (ei,e2) £ E' . In this case 
m : K — > G is injective and the data type part of K is a quotient term 
algebra T S (X\ + X 2 )| = , where T^(X 1 ) and Ts(X 2 ) are the algebras of Li 
and L2 respectively. This corresponds to one possible choice of congruence 
= considered in [5]. 

2. In order to obtain a minimal number of critical pairs it is essential to consider 
also the case, where = is the trivial congruence with Tjj(X) = Tjj(X) | = . In 
fact, a most general unifier construction cr„ : X — » Tjj(X) considered in [5] 
leads to a different E'-M ' pair factorization of fi, f2 with (ei,e2) £ E', 
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m £ M', ei : L\ —> K, ei : L2 — * K and rn : K —> G, where (ei, e-i) is jointly 
surjective for non-data nodes, the data type part of K is T^(X) and m is 
injective for non-data nodes, where non-data nodes are all nodes and edges, 
which are not data nodes (see Def. 1 ) . 

Dependent on the choice of an E'-M ' pair factorization we are now able to 
define critical pairs and strict confluence. 

Definition 9 (critical pair and strict confluence). Given an E'-M 1 pair 
factorization, a critical pair is a pair of non-parallel independent direct transfor- 
mations Pi P £= K p 2 such that (01,02) £ E' for the corresponding matches 
01 and 02 • The critical pair is called strictly confluent, if we have 

1. Confluence: There are transformations P\ =>• K' and P2 =>• K' . 

2. Strictness: Let N be the common subobject of K, which is preserved by the 
direct transformations K =>■ Pi and K => P2 of the critical pair, then N 
is also preserved by Pi => K' and P2 => K' , such that the restrictions of the 
transformatios K => Pi =>• K' and K =>■ P2 => K ' yield the same morphism 
N — > K' . (See [8] for a more formal version of strict confluence.) 

Theorem 5 (Local Confluence Theorem - Critical Pair Lemma). Given 
a typed attributed graph transformation system GTS based on ( AGraphsATG, 
M ), and an E'-M’ pair factorization, where M' is the class of all graph part 
embeddings, then GTS is locally confluent, if all its critical pairs are strictly 
confluent. 

Local confluence of GTS means that for each pair of direct transformations 
Hi <= G = 4 - H2 there are transformations Hi => X and H2 => X. 

Proof idea. This theorem follows from the Local Confluence Theorem in [ 8 ] 
shown for adhesive HLR systems over (C ,M). It requires initial puslrouts over 
M'-morphisms for a class M' “compatible” with M. The proof in [ 8 ] is based 
on completeness of critical pairs shown by using an M-M' pushout pullback de- 
composition property. In our case M' is the class of all graph part embeddings, 
which can be shown to satisfy this property, which implies that M' is “compat- 
ible” with M . In the proof idea of Thm. 4 we have discussed already how to 
verify the remaining properties which are required in the general framework of 
[ 8 ], ' □ 

Example 4 (critical pairs). Considering our typed attributed graph transforma- 
tion system M ethodM odelling (see Ex. 2 ) we now analyse its critical pairs. This 
analysis is supported by AGG. Due to the restriction of attributes in the left- 
hand side of a production to constants and variables we restrict ourselves to very 
simple congruence relations on terms of overlapping graphs. Variables may be 
identified with constants or with other variables. All other congruences are the 
identities. In the following, the AGG user interface for critical pair analysis is 
shown. It presents an overview on all critical pairs in form of a table containing 
all possible pairs of M ethodM odelling at the bottom of the figure. Applying 
for example first addP arameter , exchangeP arameter or checkNewParameter 
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and second check New Parameter leads to critical pairs. We consider the two 
critical pairs of productions addParameter and check New Parameter closer. 
On the top of the figure the left-hand sides of both productions are displayed. 
The center shows the two overlapping graphs which lead to critical pairs. Both 
overlapping graphs show the conflict on attribute ‘noOfPars’ which is increased 
by production addParameter but decreased by check New Parameter. In the 
left graph both classes are identified, i.e. the new parameter would be of the 
same type as the the two already existing ones which are equal, while the right 
graph shows two classes. 




Both critical pairs of productions addParameter and check New Parameter 
are strictly confluent. Applying first addParameter {mname , “nl”, cname , “in”) 
and then exchangeParameter{n,n + 1) and check New Par ameterQ the re- 
sulting graph is isomorphic to applying first check New Par ameterQ and then 
addParameter(mname, “nl ” , cname, “in”). The common subgraph N of the 
conflicting transformations is the result graph applying check Parameter to the 
overlapping graph. 

5 Related Work and Conclusions 

A variety of approaches to attributed graph transformation [4-6, 11, 12, 16] has 
already been developed where attributed graphs consist of a graph part and a 
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data part. These approaches are compared with ours in the following. More- 
over, general forms of algebraic structures and their transformation have been 
considered in e.g. [17-19]. 

A simple form of attributed graphs, where only nodes are attributed, is cho- 
sen in [16] and [5]. In [16], nodes are directly mapped to data values which 
makes relabelling difficult. In [5], nodes are attributed by special edges which 
are deleted and newly created in order to change attributes. We have defined 
attributed graphs in the spirit of the later notion of node-attributed graphs, but 
extended the notion also to edge attribution. In [4,11], attributed graph struc- 
tures are formulated by algebras of a special signature where the graph part and 
the data part are separate from each other and only connected by attribution 
operations. The graph part comprises so-called attribute carriers which play 
the same role as attribution edges. They assign attributes to graph elements 
and allow relabelling. I.e. attributed graph signatures can also be formalized 
by well-structured AGSTG-algebras. In [4,11], they are transformed using the 
single-puslrout approach. In [12] and [6] this attribution concept is extended by 
allowing partial algebras. Using e.g. partial attribution operations, no carriers 
or edges are needed for attribution. This leads to a slightly more compact no- 
tion, but however, causes a more difficult formalization. We are convinced that 
attributed graphs as defined in Sec. 2 are a good compromise between the ex- 
pressiveness of the attribution concept on one hand and the complexity of the 
formalism on the other. 

The theory we provide in this paper includes fundamental results for graph 
transformation which are now available for typed attributed graph transforma- 
tion in the sense of sections 2 and 4. The general strategy to extend the theory 
for typed attributed graph transformation is to formulate the corresponding re- 
sults in adhesive HLR categories and to verify additional HLR properties for the 
category (AGraphsATG, M), if they are required. But the theory presented in 
Sec. 4 is also valid in the context of well-structured attributed graph structure 
signatures AGSIG , which correspond to attributed graph signatures in [4, 11]. 
In fact, the HLR conditions required for all the results in [8] are already valid 
for the category (AGSIG- Alg, M) with well-structured AGSIG. This means, 
that our theory is also true for attributed graph transformation based on the 
adhesive HLR category (AGSIG- Alg, M) with general AGSIG for Thm. 3 and 
well-structured AGSIG for Thm. 4 and 5. 

Future work is needed to obtain corresponding results for extensions of typed 
attributed graph transformation by further concepts such as application condi- 
tions for productions or type graphs with inheritance. 

References 

1. Ehrig, H.: Introduction to the Algebraic Theory of Graph Grammars (A Survey). 

In: Graph Grammars and their Application to Computer Science and Biology. 

Volume 73 of LNCS. Springer (1979) 1-69 

2. Ehrig, H., Engels, G., Kreowski, H.J., Rozenberg, G., eds.: Handbook of Graph 

Grammars and Computing by Graph Transformation, Volume 2: Applications, 

Languages and Tools. World Scientific (1999) 




Fundamental Theory for Typed Attributed Graph Transformation 177 



3. Ehrig, H., Kreowski, H.J., Montanari, U., Rozenberg, G., eds.: Handbook of Graph 
Grammars and Computing by Graph Transformation. Vol 3: Concurrency, Paral- 
lelism and Distribution. World Scientific (1999) 

4. Lowe, M., Korff, M., Wagner, A.: An Algebraic Framework for the Transformation 
of Attributed Graphs. In: Term Graph Rewriting: Theory and Practice. John 
Wiley and Sons Ltd. (1993) 185-199 

5. Heckel, R., Krister, J., Taentzer, G.: Confluence of Typed Attributed Graph Trans- 
formation with Constraints. In: Proc. ICGT 2002. Volume 2505 of LNCS., Springer 
(2002) 161-176 

6. Berthold, M., Fischer, I., Koch, M.: Attributed Graph Transformation with Partial 
Attribution. Technical Report 2000-2 (2000) 

7. Ermel, C., Rudolf, M., Taentzer, G.: The AGG-Approach: Language and Tool 
Environment. In Ehrig, H., Engels, G., Kreowski, H.J., Rozenberg, G., eds.: Hand- 
book of Graph Grammars and Computing by Graph Transformation, Volume 2, 
World Scientific (1999) 551-603 

8. Ehrig, H., Habel, A., Padberg, J., Prange, U.: Adhesive High-Level Replacement 
Categories and Systems. In: Proc. ICGT 2004. LNCS, Springer (2004) (this vol- 
ume). 

9. Lack, S., Sobocinski, P.: Adhesive Categories. In: Proc. FOSSACS 2004. Volume 
2987 of LNCS., Springer (2004) 273-288 

10. Ehrig, H., Habel, A., Kreowski, H.J., Parisi-Presicce, F.: Parallelism and Con- 
currency in High-Level Replacement Systems. Math. Struct, in Comp. Science 1 
(1991) 361-404 

11. Clafien, I., Lowe, M.: Scheme Evolution in Object Oriented Models: A Graph 
Transformation Approach. In: Proc. Workshop on Formal Methods at the ISCE’95, 
Seattle (U.S.A.). (1995) 

12. Fischer, I., Koch, M., Taentzer, G., Voile, V.: Distributed Graph Transformation 
with Application to Visual Design of Distributed Systems. In Ehrig, H., Kreowski, 
H.J., Montanari, U., Rozenberg, G., eds.: Handbook of Graph Grammars and 
Computing by Graph Transformation, Volume 3, World Scientific (1999) 269-340 

13. Bardohl, R.: A Visual Environment for Visual Languages. Science of Computer 
Programming (SCP) 44 (2002) 181-203 

14. Ehrig, H., Prange, U., Taentzer, G.: Fundamental Theory for Typed Attributed 
Graph Transformation: Long Version. Technical Report TU Berlin. (2004) 

15. Ehrig, H., Mahr, B.: Fundamentals of Algebraic Specification 1: Equations and 
Initial Semantics. Volume 6 of EATCS Monographs on TCS. Springer, Berlin 
(1985) 

16. Schied, G.: Uber Graphgrammatiken, eine Spezifikationsmethode fur Program- 
miersprachen und verteilte Regelsysteme. Arbeitsber. des Inst, fur math. Maschi- 
nen und Datenverarbeitung, PhD Thesis, University of Erlangen (1992) 

17. Wagner, A.: A Formal Object Specification Technique Using Rule-Based Trans- 
formation of Partial Algebras. PhD thesis, TU Berlin (1997) 

18. Llabres, M., Rossello, F.: Pushout Complements for Arbitrary Partial Algebras. In 
Ehrig, H., Engels, G., Kreowski, H.J., Rozenberg, G., eds.: Theory and Applications 
of Graph Transformation. Volume 1764., Springer (2000) 131 144 

19. Grofie-Rhode, M.: Semantic Integration of Heterogeneuos Software Specifications. 
EATCS Monographs on Theoretical Computer Science. Springer, Berlin (2004) 




Parallel Independence 
in Hierarchical Graph Transformation 



Annegret Habel 1 and Berthold Hoffmann 2 



1 Carl-v.-Ossietzky-Universitat Oldenburg, Germany 
habel® inf ormatik .uni- Oldenburg . de 
2 Universititat Bremen, Germany 
hof @inf ormat ik . uni-bremen . de 



Abstract. Hierarchical graph transformation as defined in [1, 2] ex- 
tends double-pushout graph transformation in the spirit of term rewrit- 
ing: Graphs are provided with hierarchical structure, and transformation 
rules are equipped with graph variables. In this paper we analyze condi- 
tions under which diverging transformation steps H 4= G =>• H' can be 
joined by subsequent transformation sequences H =>■ M <= H' . Condi- 
tions for joinability have been found for graph transformation (called 
parallel independence) and for term rewriting (known as non-critical 
overlap). Both conditions carry over to hierarchical graph transforma- 
tion. Moreover, the more general structure of hierarchical graphs and 
of transformation rules leads to a refined condition, termed fragmented 
parallel independence, which subsumes both parallel independence and 
non-critical overlap as special cases. 



1 Introduction 

Graph transformation combines two notions that are ubiquitous in computer 
science (and beyond). Graphs are frequently used as visual models of structured 
data that consists of entities with relationships between them. Rules allow the 
modification of data to be specified in an axiomatic way. The book [3] gives a 
general survey on graph transformation, and [4, 5] describe several application 
areas. 

When graph transformation is used to program or specify systems, it should 
be possible to group large graphs in a hierarchical fashion so that they stay 
comprehensible. Many notions of hierarchical graphs have been proposed, and 
several ways of transforming hierarchical graphs have been studied in the liter- 
ature. See [6] for a rather general definition. This paper is based on [1], where 
double-pushout graph transformation [7] has been extended to a strict kind of 
hierarchical graphs where the hierarchy is a tree, and edges may not connect 
nodes in different parts of the hierarchy. This is adequate for programming; 
applications like software modeling may call for a looser notion of hierarchi- 
cal graphs, e.g., the one in [8]. In [2], transformation rules have been extended 
to rules with variables [9]. This is done in the spirit of term rewriting [10], a 
rule-based model for computing with expressions (trees): Rules are equipped 
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with variables that may be instantiated by graphs, so that a single rule appli- 
cation may compare, delete, or copy subgraphs of arbitrary size. Hierarchical 
graph transformation with variables is the computational model of DiaPlan, 
a language for programming with graphs and diagrams that is currently being 
designed [11]. 

In general, graph transformation is nondeterministic like other rule-based 
systems. Several rules may compete for being applied, at different places in a 
given graph. It is thus important to study under which conditions the result 
of a transformation sequence is independent of the order in which competing 
rules are applied. For term rewriting, parallel independence holds if steps have 
a non-critical overlap [10], and for double puslrout graph transformation, the 
slightly stronger property of direct joinability holds if steps are parallelly inde- 
pendent [12]. These results carry over to hierarchical graph transformation. More 
precisely, we shall prove that they are special cases of the Fragmented Parallel 
Independence Theorem. 

The paper is organized as follows. Section 2 collects basic notions of graphs 
and graph morphisms. In Sect. 3, we recall the basic notions of hierarchical 
graphs and hierarchical graph transformation, and show the relationship to 
substitution-based graph transformation. In Sect. 4, we discuss how indepen- 
dence results from graph transformation and term rewriting carry over to hier- 
archical graph transformation, and establish the Fragmented Parallel Indepen- 
dence Theorem. In Sect. 5, we conclude with a brief summary and with some 
topics for future work. 
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2 Preliminaries 

In the following, we recall standard notions of graphs and graph morphisms [7]. 
As in [9], we distinguish a subset of the label alphabet as variables. Variable 
edges are placeholders that can be substituted by graphs. 

Let C be a alphabet with a subset X C C of variables where every symbol l 
comes with a rank rank(l) > 0. 

A graph (with variables in X ) is a system G = (Vg,Eg, attG,laba) with 
finite sets Vq and Eg of nodes (or vertices) and edges , an attachment function 
att-G'-Ec — > Vq - 1 , and a labeling function labc'-Ec — > C such that the attach- 
ment attc(e) of every edge e consists of ranfc(Za&c(e)) nodes (that need not be 
distinct) . 

1 For a set A, A* denotes the set of all sequences over A. The empty sequence is 
denoted by e. For a mapping f:A — > B , f*:A* — ► B* denotes the extension of / 
with f*(e) = e and f*(ai ...ak) = f(ai) . . . f{ak) for oi V. 
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A graph morphism g:G — > G' between two graph G and G' consists of two 
functions gy. Vrj — > Vg> and gE ■ Eg — »■ Eg> that preserve labels and attachments, 
that is, labc ° gv = labo and attc ° gE — gy ° oMg ■ It is injective ( surjective ) 
if gv and gE are injective (surjective), and an isomorphism if it is both injective 
and surjective. It is an inclusion if gv and gE are inclusions. 

3 Hierarchical Graph Transformation 

In this section, we define hierarchical graphs, hierarchical graph morplrisms, and 
hierarchical graph transformation. For lack of space, we just recall the concepts 
devised in [1,2]; a broader discussion of these concepts, and further references 
to the scientific literature can be found in these papers. At the end of the sec- 
tion, we relate our definitions to their origins, namely double-pushout graph 
transformation [7], and substitutive graph transformation [9]. 

A graph becomes hierarchical if its edges contain graphs, the edges of which 
may contain graphs again, in a nested fashion. Variables may not contain graphs; 
they are used as placeholders for graphs in transformation rules. 

Definition 1 (Hierarchical Graph). The set 'H(X) of hierarchical graphs 
(with variables in X) consists of triples H = ( H,Fh , ctSH ) where H is a graph 
(with variables in X), Fh C Eg is a set of frame edges (or just frames) that are 
labeled in C\X, and ctsn ■ Fh — ► 'H{X) is a contents function mapping frames 
to hierarchical graphs. 

A hierarchical graph I is a part of H if I = H, or if I is a part of ctsn(f) for 
some frame / € Fh- An X-labelecl edge in some part I of H is called a variable 
edge of H. 

The skeleton of a hierarchical graph H is obtained by removing all variable 
edges from all parts of H; it is denoted by H_. Var (if) denotes the set of variables 
occurring in the parts of H. A hierarchical graph H is variable-free if H = H . 

Example 1 (Control flow graphs). In simple control flow diagrams of sequential 
imperative programs, execution states are represented by nodes (depicted as 
small circles), and execution steps are represented by edges: statements (drawn 
as boxes) are labeled by assignments, and branches (drawn as diamonds) are 
labeled by conditions. Each step connects one predecessor state to one successor 
state (for assignments), or to two (for branches, distinguished by “®” and “0”, 
respectively). Hierarchies are used for representing procedure calls (drawn like 
assignments, but with doubled vertical lines). They contain control flow graphs 
of the procedures’ bodies. Since procedures may call other procedures, control 
flow graphs may be nested to arbitrary depth. In Fig. 7 below we show six 
hierarchical control flow graphs. 

Definition 2 (Hierarchical Graph Morphism). A top hierarchical graph 
morphism ( top morphism, for short) h: H — > H' between two hierarchical graphs 
H and F[' is a pair h = (h, M) where h: H — > H' is a graph morphism such that 
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h(Fn) Q Fh>, and M = (hf. ctsii(f) — > ctsH'(h(f))f£F H is a family of top 
morphisms between the contents of the frames. A hierarchical graph morphism 
h: H — > H' is a top morphism h': H — > H" between H and some part H" of H'. 
A hierarchical graph morphism h is injective if the graph morphism h and all 
morphisms in M are injective; it is an inclusion if h and all morphisms in M 
are inclusions. A top morphism h is surjective if h and all morphisms in M are 
surjective. A top morphism h: H — > H' is an isomorphism if it is injective and 
surjective; then we call H and H' isomorphic, and write H = H' . 

Definition 3 (Substitution). A substitution pair x i— > ( H,p ) consists of a 
variable x € X and of a hierarchical graph H with a sequence p £ Vfj of rank(x) 
mutually distinct points. A finite set 

cr = {x i !-> (H 1 ,p 1 ), . . . ,x n >-*■ (H n ,p n )} 



of substitution pairs is a substitution if the variables are pairwise distinct. Then 
Dom(er) = {xi, . . . , x n } is the domain of a. 

Let I be a hierarchical graph where the top graph I’ of some part I' contains 
an edge e labeled with x. Then the application of a substitution pair x i— > 
( H,p ) to e is obtained by replacing I' with a hierarchical graph constructed 
as follows: Unite /' disjointly with H , remove e, identify every point in p with 
the corresponding attached node in attp(e), and preserve the contents of the 
frames. The instantiation of a hierarchical graph / according to a substitution 
< 7 is obtained by applying all substitution pairs in cr to all edges with a variable 
label in Dom(cr) simultaneously, and is denoted by la. 

Definition 4 (Rule). A rule p = (L <— K — » R) consists of two top morphisms 
with a common domain K. We assume that K — > L and K — > R are inclusions 
and that Var(L) D Var(R). 

The instance of a rule p for a substitution a is defined as pa = (La < — K 
R), and the skeleton of p is given by p = (L<—K_^>R). A rule p is variable-free 
if p = p. (We explain in App. A why we take the skeleton K_ of the interface in 
the instance pa, instead of Ka.) 



Definition 5 (Hierarchical Graph Transformation). Consider hierarchical 
graphs G and H and a rule p = (L <— K — > R). Then G directly derives H 
through p if there is a double-pushout 



La K_ Ra 

9 (1) I (2) 

G D ~H 



for some substitution a so that the vertical morphisms are injective. We write 
G => Pt a,g H or G => p H and call this a direct derivation where g is the hierar- 
chical graph morphism g: L — > G defining the occurrence of p in G. 
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A direct derivation G => p , a ,g H exists if and only if the occurrence g: L — > 
G above satisfies the following hierarchical dangling condition : (i) The graph 
morphism g satisfies the dangling condition for graph morphisms (see [7]), (ii) 
all morphisms in M satisfy the hierarchical dangling condition, and (Hi) for all 
deleted frames f £ Fl\ Fj, the hierarchical morphism gy. cfsi(/) — * ctsc(g(f )) 
is bijective. 

Given a hierarchical graph G, a rule p as above, a substitution a with 
Dom(cr) = Var(L), and an occurrence g satisfying the hierarchical dangling 
condition, a direct derivation is uniquely determined by the following steps: (1) 
Remove g(La — K_) from the part G' of G where g: L — > G' is top, yielding a 
hierarchical graph D , a top morphism d: K — > D' which is the restriction of g, 
and the inclusion D ' — » G' . (2) Add Ra disjointly to D' and identify the corre- 
sponding nodes and edges in K_ and d( IQ , yielding a hierarchical graph H ' and 
top morphisms D — > H and Ra — > H. (3) Obtain H by replacing H' for G' in 
G. 

Remark 1 (Relation to Adhesive High-Level Replacement) . Hierarchical graphs 
without variables and injective hierarchical graph morphisms form a category 
HiGraphs. We conjecture that the category (HiGraphs, M) of hierarchical graphs 
without variables with the class Ad of all injective top morphisms forms an 
adhesive HLR category. In this case, application of the general results in [13] 
would yield the Local Clrurch-Rosser Theorems, the Embedding, Extension, and 
Local Confluence Theorem for (HiGraphs, Ad). The statement no longer holds 
for the category (HiGraphs(A), Ad) of hierarchical graphs with variables with 
the class Ad of all injective top morphisms. 

Example 2 (Transformation of Control Flow Graphs). In Figs. 1 and 2 we show 
two rules for transforming hierarchical control flow graphs. The rule loop removes 
duplicated code before a loop. The rule ini performs “inlining” of a procedure’s 
body for its call. In the figures, the images of the interface’s nodes and edges in 
the left- and right-hand side graphs can be found by horizontal projection. 

Figure 7 shows transformations that use the rule loop in vertical direction, 
and the rule ini in horizontal direction, respectively. For applying loop, the vari- 
able D must be instantiated with the assignment “x := e” in the right column, 
and by the procedure call edge containing that assignment in the other columns. 
For applying ini, its variable D must be instantiated with the control flow graphs 
representing the statements “x := e” (in the transformations to the right), and 
“y := e!\ z := e"” (in the transformations to the left), respectively. 

A rule is applied by instantiating its variables according to some substitution, 
and constructing a double-pushout for this instance. 

In substitutive graph transformation [9], the application of a rule is deter- 
mined entirely by instantiation with a substitution. 

Definition 6 (Substitutive Graph Transformation). A substitutive rule 
p* = ( L*,R *) is a pair of hierarchical graphs. Given two hierarchical graphs 
G,H, G directly derives H through p*, denoted by G => p * H, if there is a 
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Fig. 1. The rule loop 



D 



D 



Fig. 2. The rule ini 



substitution a* such that L*a* = G' for some part G' of G, and H equals a copy 
of G wherein G' is replaced with R*a* . 

Every rule p = (L <— ff — > R) induces a substitutive rule p* = ( L*,R *) 
as follows: Extend every part K' of if by a variable edge that is attached to 
all nodes in Vk>> and is labeled with a fresh variable label (of rank |Vk-'|)> and 
obtain the hierarchical graphs L* and R* by inserting this hierarchical graph for 
the occurrence of the part K' in L and R , respectively. 

Theorem 1 (Substitutive Graph Transformation). For all hierarchical 
graphs G , H and all rules p, 

G => p H if and only if G => p * H . 

Proof Sketch. Without loss of generality, p is a rule with discrete interface if. 
(Otherwise, if we consider the modified rule p~ in which all non-variable edges 
are removed from if, we easily see that G H G => p - H.) 

For a start, let us consider the case that the rules are applied on top level. 
First, let G => Pj(Tj£ , if be a direct derivation. Define a* as the extension of a by the 
substitution pair {x i— > D}, where D is the intermediate hierarchical graph of the 
direct derivation. Then L*a* = G , and R*a* = H and G =^ p * jCr * if is a direct 
substitutive derivation. Conversely, let G => p * j(T » if be a direct substitutive 
derivation. Then L*a* = G and R*a* = if for some isomorphisms g *: L*a* — > G 
and h*: R*a* — » ff. Let cr be the restriction of er* to Dom(cr*) — {x} and g be 
the restriction of g* to La. Then there is a direct derivation G =t- p , CT ,g H. 

Now, let the direct derivations apply to a part G' of G. Then both kinds of 
direct derivations construct a graph ff wherein G' is replaced by a part H' with 
a direct top-level derivation. 

Theorem 1 shows the close relationship between the double-pushout approach 
and the substitution-based approach. As a consequence, the main proofs can be 
done on a substitution-based level. 
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4 Parallel Independence 

The term “parallel independence” has been coined for a criterion of commutativ- 
ity (or the Local Church-Rosser property) in double-pushout graph transforma- 
tion (see, e.g., [7]). The related area of term rewriting is about the transforma- 
tion of terms, or trees, by rules with variables. Commutativity has been studied 
for term rewriting as well, along with a more general property, called joinabil- 
ity. Commutativity and joinability are important prerequisites for showing that 
a transformation mechanism has unique normalforms: If all competing direct 
derivations are commutative (joinable), transformation is strongly confluent (or 
locally confluent, resp.). Strongly confluent, and terminating locally confluent 
“abstract reduction systems” do have unique normalforms. (See, e.g., [10].) 

We re-phrase commutativity and joinability for hierarchical graph transfor- 
mation. 

Definition 7 (Commutativity and Joinability). A pair of direct derivations 
H 4= G => H' of the same hierarchical graph is called competing if H ^ H' . 
Competing direct derivations are 

— commutative if H => M <= H r , and 

— joinable if H =>• M 4= H' , 

for some hierarchical graph M, respectively. (See Figs. 3 and 4 below.) 



G G 





\ 




X 


H 


H' 


H 


H‘ 


X 




X 





M M 



Fig. 3. Commutativity Fig. 4. Joinability 



For double-pushout graph transformation it has been shown that commuta- 
tivity holds if competing direct derivations are parallelly independent of each 
other (see, e.g., [12,7,14]). For term rewriting, the presence of variables in rules 
has made it necessary to study joinability. Term rewriting steps are joinable if 
they are non-critically overlapping. 

We shall first demonstrate that both criteria, parallel independence as well as 
non-critical overlaps, carry over to hierarchical graph transformation. However, 
since hierarchical graphs generalize both graphs and terms, these criteria turn 
out to be special cases of a more general condition for joinability that will be 
discussed in the sequel. 

General Asumption. In the following, let H <= p ,a-,g G =>p' i(T ',g' H' be a pair 
of competing direct derivations using the rules p — (L <— K —> R) and p' = 
(L' <- K’ -> R'). 
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The morphism g of the rule instance La in G defines a skeleton fragment 
g(L) of G that contains an interface fragment a(K) C also, every variable 

edge e in L defines a variable fragment g(a(e)). In the same way, g' defines a 
skeleton fragment g'{Lf) of G with an interface fragment g'(Kf), and variable 
fragments g'(a(e')) for the variable edges e! in L'. 

Figure 5 below illustrates how “classical” parallel independence carries over 
to hierarchical graph transformation. Competing direct derivations are paral- 
lelly independent if and only if the images of the rules’ left-hand side skeletons 
(the semicircles) overlap only in their skeleton interface fragments (the white 
areas of the semicircles). The deleted part of the skeleton fragments (drawn as 
grey semicircles) and their variable fragments (drawn as grey boxes) must be 
disjoint. In this situation, competing direct derivations leave the occurrence of 
the respective other rule intact; they commute by a direct derivation using the 
other rule at the unchanged occurrence. 

Figure 6 shows the non-critical overlap of two direct derivations. The left- 
hand side of one rule must occur completely inside a single variable fragment 
(of x in the illustration) of the other rule. In this case, the competing direct 
derivations are not commutative. In general, several steps may be necessary 
to join them again. Let p be the rule subsuming the occurrence of p' in the 
variable fragment g(a(x)). In this example x occurs twice in p’s left hand side. 
A direct derivation with p leads to a hierarchical graph H wherein g(a{x)) will 
occur as often as x occurs in p’s right hand side, say i times. Then H contains 
i > 0 occurrences of the left hand side of p' . The occurrences of p' in G and 
in H are parallelly independent, and can be transformed in 2 and i steps with 
p', respectively. In the resulting graphs, every variable fragment of x has been 
transformed in the same way, so that there is a direct derivation with p between 
the hierarchical graphs, which joins the derivations. 






a(x) 


&{y) 


a(x) 

& 



Fig. 5. “Classical” parallel independence Fig. 6. Non-critical overlap 



Definition 8 (“Classical” Parallel Independence). A pair of competing 
direct derivations is “classically'’ parallelly independent if the intersection of 
g(La) and g'(L'a') in G is contained in the intersection a(K) n a' (K') of their 
skeleton interface fragments. 

Fact. “Classically” parallelly independent direct derivations are commutative. 
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Definition 9 (Non-critical Overlap). A pair of competing direct derivations 
is non- critically overlapping if the intersection of La and L'a' in G consists of 
items of a single variable fragment, that is, g(La) C g'(a'(e')) for some variable 
edge e' in L' or, vice versa, g'(L'a') C g(a(e)) for some variable edge e in L. 

Theorem 2. Noncritically overlapping derivations are joinable. 

Proof Sketch. Let H G => p '.a' ,g' H' be non-critically overlapping. With- 

out loss of generality, g'(L'a') C g(a(e)) for some variable edge e in L with 
label, say x. Assume first that g is top. By the Restriction Lemma [7], there is a 
restricted direct derivation d'(e) of the direct derivation d!\ G H' to the vari- 
able fragment g(a(e)) with result, say H'(e). By theorem 1, G = L*a* and H = 
R*a*. By the Embedding Lemma [7], the direct derivation d'(e) can be embed- 
ded into every variable fragment g(a(e)) of G = L*a* with Za&i(e') = Za&i(e). 
The embedded derivations are parallelly independent. By parallel independence 
[12,7], there is a derivation L*a* =ty L*t* where r* is the modification of the 
substitution a* with r*(:r) = H'(e) and r*(y) = a*(y) otherwise. By theorem 1, 
there is a direct derivation L*t* => p R*t* . The direct derivation d'(e) can be 
embedded into every variable fragment g(a(e)) of R*a* with Za& i (e') = lab L (e). 
Again, the embedded derivations are parallelly independent. Thus, there is a 
derivation R*a* =>*, R*t *, and, the direct derivations are joinable, see below. 

„ L*a* 

A X 

R*a* L*t* 

X A 

R*t* 

Now, if g is not top, let G be the part of G where g is top. Then there are 
competing derivations H -4= G =>■ H' that have joining derivation sequences 
H =4> M <4= H' . Since derivations are closed under the part relation, graphs H, 
H' and M can be constructed by replacing the parts in corresponding to G' in 
those graphs so that we get the diagram above. 

Example 3 (Commuting and Joining Control Flow Derivations). Figure 7 shows 
several direct derivations of control flow graphs. The graph in the middle of the 
top row can be transformed in four ways: The rule loop applies to the loop in the 
else part of the top branch, and ini can be applied to its procedure call edges. 
To its left, we see the graph after applying ini to the procedure call on the left; 
beneath it we see the result of applying loop to the loop in its else part; the 
result of applying ini twice, to the (isomorphic) procedure calls in that loop is 
shown on the right. 

The loop step is “classically” parallelly independent of the left ini step, and 
the result of the commuting steps is shown in the lower left. Both occurrences of 
the ini steps leading to the right are contained in the fragment of the variable D 
of the loop rule; since this variable occurs twice on the left hand side, and once 
on the right hand side of loop , two steps are needed in the top row, and one in 
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Fig. 7. Parallel independent transformations of control flow graphs 

the bottom row, until they lead to the graphs on the right where the loop rule 
can be applied again. 

Consider the noncritical overlap illustrated in Fig. 6. For term rewriting, 
where trees are being transformed, the occurrence of a rule (like p') can only 
overlap with a single variable fragment (of x), because the occurrence is con- 
nected, and the variable fragment is disconnected from other variable fragments. 
However, graphs need not be connected so that further situations arise in the 
case of hierarchical graph transformation, which are sketched in Fig. 8. 

In the situation on the left, p' overlaps with the skeleton interface fragment, 
and with two variable fragments of p. The competing derivations would be join- 
able if the involved fragments of p are preserved in the direct derivation with p. 
This is the case if they are left intact, i.e. , if the involved variables occur once 
on both sides of p because the skeleton of p is also involved. In the situation on 
the right, this need not be the case as the skeleton fragment is not involved in 
the overlap. Here, it suffices when the involved variables have the same number 
of occurrences on both sides of p. 
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Thus, whenever the intersection of La and L' a' in G consists of several vari- 
able fragments we have to require that one occurrence induces a decomposition 
of the other rule into subrules such that the fragments can be transformed sepa- 
rately by the subrules. Furthermore, the transformation must be consistent, i.e, 
same fragments have to be transformed in the same way, and complete. Further- 
more, the transformation must be repetitive, i.e. after the application of the rule 
a complete parallel transformation must be possible. 

A variable edge e in L is involved in the direct derivation G => p < H' if the 
intersection of the skeleton fragment g'(L /) and the variable fragment g(a(e)) 
is non-trivial, i.e. if the intersection consist not only of points. The label of an 
involved edge is called an involved label. 

Definition 10 (Fragmented Direct Derivations). Let (d,d') be a pair of 
direct derivations. Then d! is g-decomposable if there is a decomposition of d' into 
a non-changing subderivation on the skeleton fragment and subderivations d' {e) 
on the variable fragments for e in L* . In this case, we speak of a ^-decomposition 
of d' . A ^-decomposition is consistent if Za&i(e) = Za&L(e') implies r(e) = r(e') 
for all involved edges. It is complete if there is no not-involved edge with involved 
label. It is completable if d! can be extended to a derivation G I' with 
complete set of involved edges. A g-decomposable, consistent, and completable 
direct derivation is called g-compatible. The direct derivation d is g' -repetitive 
if there is a derivation H =>*, R*t* of some substitution r*. The pair (d,d r ) is 
fragmented if d' is g-compatible and d is g-repetitive or d is g-compatible and 
d' is g-repetitive. 

Fact. Every g-compatible direct derivation G H' through a top mor- 

phism g' can be extended to a derivation G =>-, L*t* for some substitution r*. 

Proof Sketch. Let d':G => p / H' be g-compatible. Then d! is g-decomposable, 
consistent, and completable. By g-decomposability, there is a decomposition of 
d! into a non-changing subderivation on the skeleton fragment and subderivations 
d!(e) on the variable fragments for e in L* such that H ' is obtained from L* by 
replacing the ordinary variables e in L* by the result r(e) of the subderivation 
df (e) and the context variable in L* by the intermediate hierarchical graph D. 
By consistency, the replacements define a substitution 

r* = {Za&i(e) i— > r(e) | e £ El} U {x i— > D} . 




Fig. 8. Fragmented parallel independence 
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In the case of completeness, H' — L*t*. In the case of completability, d! can be 
extended to a derivation G =>y /' such that I' = L*t* . 

Definition 11 (Fragmented Parallel Independence). A pair of direct deri- 
vations (d, d') is fragmentedly parallel independent if the skeleton fragments over- 
lap only in the skeleton interface fragments, and if (d, d') is fragmented. 

Theorem 3 (Fragmented Parallel Independence). Every pair of fragment- 
edly parallel independent direct derivations is joinable. 

Proof Sketch. Let d: G => P ,<j, g H and d': G H' be fragmentedly parallel 

independent. Without loss of generality, assume that dl is (/-compatible and d is 
(/'-repetitive. We first consider the case that g is top. By Theorem 1, G — L*a* 
and H = R*a*. By the (/-compatibility of d\ there is a derivation L*a* =>+, L*t* 
for some substitution r*. By Theorem 1, there is a direct derivation L*t* 
R*t*. By (/'-repetitiveness of d, there is derivation R*a* =>*, R*t*. Thus, the 
direct derivations are joinable. 

L * • 

<7 

D* — * T * 

A (T T 

X A 

R*t* 

The case that g is not top can be reduced to the situation above by the same 
argument as in the proof of Thm. 2. 

Note that fragmented parallel independence subsumes both “classical” par- 
allel independence (illustrated in Fig. 5), and non-critical overlaps (shown in 
Fig. 6): Only the skeleton fragment of p is involved in the first case, and a single 
variable fragment is involved in the second case. 



A 





Fig. 9. The rule fold Fig. 10. The rule join 



Example ) (Fragmented Parallel Independence of Control Flow Graph Transfor- 
mations). In Figs. 9 and 10, we define two rules that illustrate particular cases 
of fragmented parallel independence: If some control flow graph D matches the 
variable fragment of a procedure, fold replaces the body D by a call to that pro- 
cedure, and if a control flow graph ends in two copies of the same subdiagrams 
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T , join redirects one of them to the other. (The empty assignment represents a 
neutral computation.) 

The rule fold it a parallel rule of the form fold = id + inZ -1 (where id is the 
identical rule) and hence decomposable into two subrules. The rule join is not 
decomposable. Figure 11 shows two fragmentedly parallelly independent direct 
derivations steps through the rules fold and loop that overlap in a nontrivial 
way: The occurrences of the left-hand sides intersect not only in the body of the 
loop (which is an instantiation of the variable D in loop), but also in the 
successor state of the branch at the bottom. Nevertheless, the direct derivations 
are joinable, as the fold rule divides into two rules, and one of them is just the 
identity. 





Fig. 11. Fragmented parallel independent transformations of control flow graphs 



Parallel Independence in Hierarchical Graph Transformation 



191 



5 Conclusion 

We have studied under which conditions direct transformations of a graph are 
independent so that they can be joined to a common graph by subsequent trans- 
formations. Graphs and rules have been generalized by concepts known from 
term rewriting: Graphs are equipped with a tree-like hierarchy, as edges may 
contain graphs which are again hierarchical; rules contain graph variables by 
which subgraphs of arbitrary size can be compared, deleted, or copied in a sin- 
gle transformation step. Our results combine properties known for plain graph 
transformation and term rewriting. 

To our knowledge, parallel independence of graph transformation has only 
been studied for the double- and single-puslrout approaches. In both cases, nei- 
ther hierarchies, nor graph variables have been considered. Parallel independence 
has also been investigated in the more general framework of adhesive high-level 
replacement systems [15,13]. It looks as if hierarchical graph transformation 
without variables is an instance of adhesive high-level replacement; this is not 
true for hierarchical graph transformation with variables, however. 

The study of parallel independence has lead to critical pair lemmata, both for 
term rewriting and for graph transformation [16]: whenever transformation steps 
are not parallelly independent, these systems are locally confluent if joinability 
can be shown for finitely many critical pairs of graphs and terms, respectively. 
Since parallel independence of hierarchical graph transformation turned out to 
be a reasonable combination of the results for graph transformation and term 
rewriting, we shall try to combine these lemmata to obtain a critical pair lemma 
for hierarchical graph transformation as well. 

Furthermore, local confluence implies general confluence if the rules are also 
terminating. Since termination can be characterized by the finiteness of so-called 
forward closures of rules, both for term rewriting and for graph transforma- 
tion [17], we think it may be possible to combine these results to a similar 
theorem for hierarchical graph transformation. Finally, if we are able to find 
decidable sufficient criteria for termination, this, together with a critical pair 
lemma, would allow to decide confluence in restricted cases. This would give 
immediate benefits for the analysis of DiaPlan, a language for programming 
with graphs and diagrams that shall be based on hierarchical graph transforma- 
tion [11]. 
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A Use of Skeleton Interfaces in Rule Instances 

The “only-if” direction of theorem 1 requires that the interface of a rule in- 
stance is its skeleton, and not its instance. Otherwise the rule instance would be 

applicable to graphs where the substitutive rule does not apply. This shall be 

illustrated by an example. 

The rule p = (L <— K — > R) shown in Fig. 12 has the interface variable A. 

If p would b instantiated by the substitution a = {A i— > the (extended) 
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rule instance pa = (La <— Ka — > Ra) has a direct derivation d: G => p(T H. 
However, there is no way to extend a to a substitution a* for the substitutive rule 
p* = ( L*,R *) so that L*a* = G: The context variable C cannot be instantiated 
by the substitution pair D i— > °~| c r because that graph is connected to an “inner 
node” of ‘H’s substitution. (There is a substitution a' where a' (A) is a single 
point, and a'(C ) = a* (A) U a*(C), but the instance p*a' does not derive H , but 
a subgraph of H where the right a-edge is missing.) 




Fig. 12. Skeleton interfaces in instances 
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Abstract. Code generators are widely used in the development of em- 
bedded software to automatically generate executable code from graphi- 
cal specifications. However, at present, code generators are not as mature 
as classical compilers and they need to be extensively tested. This paper 
proposes a technique for systematically deriving suitable test cases for 
code generators, involving the interaction of chosen sets of rules. This 
is done by formalising the behaviour of a code generator by means of 
graph transformation rules and exploiting unfolding-based techniques. 
Since the representation of code generators in terms of graph grammars 
typically makes use of rules with negative application conditions, the 
unfolding approach is extended to deal with this feature. 



1 Introduction 

The development of embedded software has become increasingly complex and 
abstraction appears to be the only viable means of dealing with this complexity. 
For instance, in the automotive sector, the way embedded software is developed 
has changed in that executable models are used at all stages of development, 
from the first design phase up to implementation ( model-based development). 
Such models are designed with popular and well-established graphical modelling 
languages such as Simulink or Stateflow from The MathWorks 1 . While in the 
past the models were implemented manually by the programmers, some recent 
approaches allow the automatic generation of efficient code directly from the 
software model via code generators. However, at present, they are not as mature 
as tried and tested C or ADA compilers and their output must be checked with 
almost the same expensive effort as for manually written code. 

* Research partially supported by EU FET-GC Project IST-2001-32747 AGILE, the 
EC RTN 2-2001-00346 SegraVis, DFG projects SANDS and the IMMOS project 
funded by the German Federal Ministry of Education and Research (project ref. 
01ISC31D). 

1 See www . mathworks . com 
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One of the main problems in code generator testing is the methodical inabil- 
ity to describe, in a clear and formal way, the mode of operation of the code 
generator’s transformation rules and the interaction between such rules (this 
is especially true for optimisation rules), a fact which makes it hard to devise 
effective and meaningful tests. Therefore, an essential prerequisite for testing a 
code generator is the choice of a formal specification language which describes 
the code generator’s mode of action in a clear way [6]. 

When dealing with a code generator which translates a graphical source lan- 
guage into a textual target language (e.g. C or Ada), a natural approach consists 
in representing the generator via a set of graph transformation rules. Besides pro- 
viding a clear and understandable description of the transformation process, as 
suggested in [19], this formal specification technique can be used for test case 
derivation, allowing the specific and thorough testing of individual transforma- 
tion rules as well as of their interactions. We remark that, while each rule is 
specified with a single transformation step in mind, it might be quite difficult 
to gain a clear understanding of how different rules are implemented and how 
they can interfere over an input graph. Testing all input models triggering any 
possible application sequence is impractical (if not impossible), because of the 
large (possibly infinite) number of combinatorial possibilities, and also unneces- 
sary as not all combinations will lead to useful results. It is, however, of crucial 
importance to select those test cases which are likely to reveal errors in the code 
generator’s implementation. 

In this paper we will use unfoldings of graph transformation systems [18, 4] 
in order to produce a compact description of the behaviour of code generators, 
which can then be used to systematically derive suitable test cases, involving 
the interaction of chosen sets of (optimising) rules. Our proposal is based on 
the definition of two graph grammars: the generating grammar , which generates 
all possible input models for the code generator (Simulink models, in this pa- 
per) and the optimising grammar , which formalises specific transformation steps 
within the code generator (here we focus only on optimisations). The structure 
obtained by unfolding the two grammars describes the behaviour of the code 
generator on all possible input models. Since the full unfolding is, in general, 
infinite, the procedure is terminated by unfolding the grammars up to a finite 
causal depth which can be chosen by the user. Finally, we will show how the 
unfolded structure can be used to select test cases (i.e., code generator input 
models), which are likely to uncover an erroneous implementation of the opti- 
misation techniques (as specified within the second graph grammar). The task 
of identifying sets of rules whose interaction could be problematic and should 
thus be tested, might require input from the tester. However once such sets are 
singled out, the proposed technique makes it possible to automatically determine 
corresponding test cases, namely input models triggering the desired behaviours, 
straight from the structure produced via the the unfolding procedure. 

The behaviour of code generators is naturally represented by graph grammars 
with negative application conditions [9], while the unfolding approach has been 
developed only for “basic” double- or single-pushout graph grammars [18,4]. 
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Hence, a side contribution of the paper is also the generalisation of the unfolding 
construction to a class of graph grammars with negative application conditions, 
of which, due to space limitations, we will provide only an informal account. 

The rest of the paper is structured as follows. Section 2 gives an overview 
of the automatic code generation approach and discusses code generator testing 
techniques. Section 3 presents the class of graph transformation systems used 
in the paper. Section 4 discusses the idea of specifying a code generator and 
its possible input models by means of graph transformation rules. Section 5 
presents an unfolding-based technique for constructing a compact description of 
the behaviour of a code generator and Section 6 shows how suitable test cases can 
be extracted from such a description. Finally, Section 7 draws some conclusions. 



2 Automatic Code Generation 

In the process of automatic code generation, a graphical model, consisting for 
instance of dataflow graphs or state charts, is translated into a textual language. 
First, a working graph free of layout information is created and, in the next 
step, a gradual conversion of the working graph into a syntax tree takes place. 
In the individual transformation phases from the working graph to the syntax 
tree, optimisations are applied in which, for instance, subgraphs are merged, 
discarded or redrawn. Finally, actual code generation is performed, during which 
the syntax tree is translated into linear code. 

In practice, a complete test in this framework is impossible due to the large or 
even infinite number of possible input situations. Accordingly, the essential task 
during testing is the determination of suitable (i.e. error-sensitive) test cases, 
which ultimately determines the scope and quality of the test. 

In the field of compiler testing much research has been done concerning test 
case design, namely test case generation techniques. We can distinguish two main 
approaches: automatic test case generation and manual test case generation. 
The first approach yields a great number of test cases in a short time and at a 
relatively low cost. In most cases, as originally proposed by Purdom [17], test 
programs are derived from a grammar of the source language by systematically 
exercising all its productions. An overview of this and related approaches is given 
in [6]. However, the quality of the test cases is questionable because the test case 
generation process is not guided by the requirements (i.e. the specification). 

A different (and more reliable) method is to generate test cases manually 
with respect to given language standards, like the Ada Conformity Assessment 
Test Suite (AC ATS) 2 , or commercial testsuites for ANSI/ISO C language confor- 
mance 3 . However, there is no published standard for graphical source languages 
such as Simulink or Stateflow. Moreover, the manual creation and maintenance 
of test cases is cost-intensive, time-consuming and also requires knowledge about 
tool internals. 

2 See www . adaic . com 

3 ANSI/ISO FIPS-160 C Validation Suite (ACVS) by Perennial, www.peren.com 
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A technique for testing a code generator systematically on the basis of graph 
rewriting rules was proposed in [19]. The graph-rewriting rules themselves are 
used as a blueprint for systematic test case (i.e. model) generation. Additionally, 
the mentioned paper shows how such models can be used in practice. A two-level 
hierarchy of testing is proposed: First, suitable input models to be used as test 
cases for the code generator are determined; then the behaviour of the code gen- 
erated from such models over specific (suitably chosen) input data is compared 
with that of the (executable) specification, in order to ensure correctness. A dif- 
ference with the work in the present paper is that in [19] a test case is selected 
on the basis of a single rule, while here we will consider the interaction of several 
rules to derive test cases which can trigger these complex behaviours. Still the 
methodology proposed in [19] to use the test cases once they are available, can 
be applied also to the test cases produced via the technique in our paper. 

Graph transformation systems have also been used in other ways in con- 
nection with code generator specification or verification. For instance, in [11, 
16] graph and tree replacement patterns are used for verifying a code generator 
formally and in [2] graph rewriting rules are used for generating an optimiser. 
A complete code generator, capable of translating Simulink or Stateflow models 
into C code, has been specified in [14] with the Graph Rewriting and Transfor- 
mation language GReAT [12]. 

3 Graph Transformation Systems 

We use lrypergraplrs, which allow us to conveniently represent functions with 
n arguments by (n + l)-ary hyperedges (one connection for the result, the rest 
for the parameters). Moreover we use graph rewriting rules as in the double- 
pushout approach [8,10] with added negative application conditions [9]. A rule, 
apart from specifying a left-hand side graph that is removed and a right-hand 
side graph that replaces it, specifies also a context graph that is preserved, and 
forbidden edges that must not occur attached to the left-hand side. 

Hereafter A is a fixed set of edge labels and each label l £ A is associated with 
an arity ar(l) £ N. Given a set A , we denote by A* the set of finite sequences of 
elements of A and for s £ A*, |s| denotes its length. 

Definition 1 (Hypergraph). A (A-)lrypergraplr G is a tuple (Vq, Eg,cg,Ig)i 
where Vq is a set of nodes, Eg is a set of edges, cg- Eg — > Vq is a connection 
function and Iq. Eq — > A is the labelling function for edges satisfying ar(J,G(e)) = 
|cg(c)| for every e £ Eg- Nodes are not labelled. 

Hypergraph morphisms ip: G — > G' and isomorphisms are defined as usual. 

Definition 2 (Graph rewriting rules with negative conditions). A graph 
rewriting rule r is a tuple (L I R, N) where ifL- 1 —> L and cpRi I R are 
injective graph morphisms. We call L the left-hand side, R the right-hand side 
and I the context. We assume that (i) pl is bijective on nodes, (ii) L does not 
contain isolated nodes, (Hi) any node isolated in R is in the image of <pr. 
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Furthermore N is a set of injective morphisms p: L — > L v , called negative 
application conditions, where (iv) El v — p(El) contains a single edge referred 
to as e v and (v) L v does not contain isolated nodes. 

A rule r = (L V 1 I R, N ) consists of two components. The first component 

L i r is a graph production, specifying that an occurrence of the left- 
hand side L can be rewritten into the right-hand side graph R , preserving the 
context I. Condition (i) stating that Pl is bijective on nodes ensures that no 
nodes are deleted. Nodes may become disconnected, having no further influence 
on rewriting, and one can imagine that they are garbage-collected. Actually, 
Conditions (ii) and (iii) essentially state that we are interested only in rewriting 
up to isolated nodes. By (iii) no node is isolated when created and by (ii) nodes 
that become isolated have no influence on further reductions. 

The second component N is the set of negative application conditions. In- 
tuitively, each L v extends the left-hand side L with an edge which must not 
be connected to the match of L to allow a rule to be applied. The negative ap- 
plication conditions here are weaker than in [9]. This will allow us to represent 
negative application conditions by inhibitor arcs in the unfolding (see Section 5) . 

Definition 3 (Match). Let r = (L I R, N ) be a graph rewriting rule and 

let G be a graph. Given an injective morphism p: L — » G, a falsifying extension 
for p is an injective morphism p' : L v — » G such that p' op = p for some p £ N. 
In this case p'(e v ) is called a falsifying edge for p. The morphism p is called a 
match of r whenever it does not admit any falsifying extension. 

Given a graph G and a match in it, G can be rewritten to H (in symbols: 
G => H ), by applying rule r as specified in the double-puslrout approach [8]. 

Definition 4 (Graph grammar). A graph grammar Q = (7?.. Go) consists of 
a set of rewriting rules 1Z and a start graph Go without isolated nodes. We say 
that a graph G is generated by Q whenever Go =>* G. 



4 Specifying Code Generation by Graph Transformation 

In our setting, code generation starts from an internal graph representation of a 
Simulink or Stateflow model, free of layout information. Especially the first steps 
of code generation, involving optimisations that change the graph structure of 
a model (e.g. for dead code elimination), can be naturally described by graph 
rewriting rules. In the sequel, the set of optimising rules is called optimising 
grammar, even if we do not fix a start graph. Since our aim is to test the code 
generator itself, independently of a specific Simulink model, we need some means 
to describe the set of all possible models that can be given as input to the code 
generator. In our proposal this is seen as a graph language generated by another 
grammar, called generating grammar. 




Generating Test Cases for Code Generators 



199 




sum product integer variable result edge connection 



Fig. 1 . Edge types for the graph rewriting system (constant folding). 



Example: We illustrate the above concepts with an example describing the first 
steps of code generation, starting from acyclic graphs which represent arithmetic 
expressions. We will give only excerpts of the two graph rewriting systems: the 
generating grammar, describing acyclic graphs which represent arithmetic ex- 
pressions, and the optimising grammar, describing constant folding, i.e., simpli- 
fication and partial evaluation of arithmetic expressions. 

We assume that a maximal depth a is fixed for arithmetic expressions. Then 
we will use the edge types depicted in Fig. 1 for 1 < * < a. Integers and variables 
are generically represented by I and V edges. Since in our setting we are mainly 
concerned with structural optimisation steps, we do not consider attributes here: 
As soon as a test case is generated, it can be equipped with suitable values for all 
the constants involved. For instance, Fig. 3 shows an acyclic graph representing 
the arithmetic expression i\ + (i \ * 12 ) for some arbitrary integers ii,i 2 - 

The generating grammar Q g , which is depicted in Fig. 2, generates operator, 
result, integer and variable edges and connects them via -E-edges (connecting 
edges), provided no edge of this kind is present yet. The rules are specified in the 
form “left-hand side => right-hand side” . Edges of the context are drawn with 
dashed lines and nodes of the context are marked with numbers. Negative ap- 
plication conditions are depicted as crossed-out edges. Note that (CreateConn2) 
is a rule schema: an E-edge between operator edges is only allowed if the first 
operator has a smaller (arithmetic) depth than the second one, i.e., if i < j, thus 
ensuring acyclicity. Some of the rules are missing, for example the rule generat- 
ing a product edge (analogous to the rule (CreateSum)) and several more rules 
connecting operator edges. The start graph is the empty graph. 

The optimising grammar Q 0 is (partially) presented in Fig. 4: We give rules 
for reducing the sharing of constants (ConstantSplitting), for removing useless or 
isolated parts of the graph (KillUselessFunction), (KillLonelyEdge), and for sim- 
plifying the graph by evaluating the sum of two integers (ConstantFoldingSum). 
More specifically, the optimisation corresponding to the last rule computes the 
integer of the right-hand side as the sum of the two integers of the left-hand 
side. A requirement for its application is the absence of constant sharing. 



5 Unfolding Graph Transformation Systems 

The unfolding approach, originally devised for Petri nets [15], is based on the 
idea of associating to a system a single branching structure, representing all its 
possible runs, with all the possible events and their mutual dependencies. For 
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Fig. 2. Rules of the generating grammar (constant folding). 




Fig. 3. A graph representing the arithmetic expression i i + (fi * if). 



graph rewriting systems, the unfolding is constructed starting from the start 
graph, considering at each step the rewriting rules which could be applied and 
then recording in the unfolding possible rule applications and the graph items 
which they generate (see [18,4]). Here we sketch how the unfolding construction 
can be extended to graph grammars with negative application conditions. Space 
limitations keep us from giving a formal presentation of the theory. 

The unfolding of a graph grammar will be represented as a Petri net-like 
structure. We next introduce the class of Petri nets which plays a basic role in 
the presentation. Given a set S, we denote by S® the set of multisets over S, 
i.e. , 5® = {m \ m : S N}. A multiset m can be thought of as a subset of S 
where each s € S occurs with a multiplicity m(s). When m(s) € {0,1} for all 
s £ S the multiset m will often be confused with the set {s € S \ m(s ) = 1}. 

Definition 5 (Petri net with read and inhibitor arcs). Let L be a set 

of transition labels. A Petri net with read and inhibitor arcs is a tuple N = 
(S n ,T n , •(),()*,(), \),Pn) where Sn is a set of places, T 'n is a set of transi- 
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Fig. 4. Rules of the optimising grammar (constant folding). 



tions and pn'- Tjv — ^ L is a labelling function. For any transition t £ T)v , *t, 
t* , t, t £ Sn® denote pre-set, post-set, context and set of inhibitor places oft. 

When s £ t we say that t is connected to s via a read arc. In this case t may 
only fire if place s contains a token. This token will not be affected by the firing. 
On the other hand, when s £ t we say that t is connected to s via an inhibitor 
arc and t is allowed to fire only if s does not contain a token. Read arcs [13] 
will be used to represent, at the level of Petri nets, the effects arising from the 
possibility of preserving graph edges in a rewriting step. Inhibitor arcs [1] will 
be used to model the effects of the negative application conditions that we have 
at the level of graph grammar rules. 

The mutual dependencies between transitions play a crucial role in the def- 
inition of the unfolding. Given a net N, the causality relation <jv is the least 
transitive relation such that t\ <n t 2 if t\* fl ( *t 2 U £2) 7^ 0, i.e. , if ti produces 
a token consumed or read by £2- 

In ordinary Petri nets, two transitions t\ and £2 competing for a resource, 
i.e., which have a common place in the pre-set, are said to be in conflict. The 
presence of read arcs leads to an asymmetric form of conflict: if a transition 
“consumes” a token which is “read” by t\ then the execution of £2 prevents t\ 
to be executed, while the sequence “ti followed by t 2 ” is legal. The asymmetric 
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conflict /'n can be formally defined by t\ /V ^2 if t\ fl *t ,2 ^ 0 or *t\ fl *f 2 0- 

The last clause includes the ordinary symmetric conflict as asymmetric conflict 
in both directions, i.e. , t\ /V ^2 and £2 /V ti- More generally, transitions 
occurring in a cycle of asymmetric conflicts ti t -2 /*n • • ■ /V tn cannot 
appear in the same computation since each of them should precede all the others. 

The unfolding of a graph grammar with negative application conditions is 
defined as a Petri graph [5], i.e., a graph with a Petri net “over it”, using the 
edges of the graph as places. 

Definition 6 (Petri graph). Let Q = (lZ,Go) be a graph grammar. A Petri 
graph ( over Q) is a tuple P = ( G , N) where G is a hypergraph, N is a Petri net 
with read and inhibitor arcs whose places are the edges of G, i.e., Sn = Eg, and 
the labelling pn- Tjv — > 1Z of the net maps the transitions to the graph rewriting 
rules ofQ. A Petri graph with initial marking is a tuple (P, mo) where mo £ Eg® . 

Each transition in the Petri net will be interpreted as an occurrence of a graph 
production at a given match. Note that Definition 6 does not ask that the pre- 
set, post-set, context or inhibitor places of a transition t have any relation with 
the corresponding graph rewriting rule pisr(t), but the unfolding construction 
presented later will ensure a close relation. 

As in Petri net theory, a marking m £ Eg® is called safe if any place (edge) 
contains at most one token. A safe marking m of a Petri graph P = (G, N ) 
can be seen as a graph, i.e., the least subgraph of G including exactly the edges 
which contain a token in to. Such a graph, denoted by graph{m), is called the 
graph generated by to. 

We next describe how a suitable unfolding can be produced from the gen- 
erating/optimising grammars associated to a code generator. We introduce the 
criteria and conditions that must be met step by step. At first, the graph gram- 
mars are unfolded disregarding the negative application conditions. 

Petri graph corresponding to a rewriting rule: Every graph rewriting rule can be 
represented as a Petri graph without considering negative application conditions: 
Take both the left-hand side L and the right-hand side R , merge edges and 
nodes that belong to the context and add a transition, recording which edges 
are deleted, preserved and created. For instance the Petri graph P corresponding 
to rule (CreateConnl) in Fig. 2 is depicted in Fig. 5 (see the Petri graph in the 
middle). Observe that the transition preserves the edges labelled + and I (read 
arcs are indicated by undirected dotted lines) and produces an edge labelled E. 
In this case no edges are deleted. In order to distinguish connections of the graph 
and connections between transitions and places, we draw the latter as dashed 
lines. 

Unfolding step: The initial Petri graph is obviously the start graph Go of the 
generating grammar, with no transitions. At every step, we first search for a 
match of a left-hand side L, belonging to a graph production r. This match 
must be potentially coverable (concurrent), i.e., it must not contain items which 
are causally related and the set of causes of the items in the match must be 
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Fig. 5. Performing an unfolding step. 



conflict-free (negative application conditions will be taken into account later). 
We now take the Petri graph P associated to r and merge the edges and nodes 
of L in P with the corresponding items of the occurrence of L in the partial 
unfolding. Fig. 5 exemplifies this situation for an incomplete unfolding IA and 
the Petri graph P representing rule (CreateConnl) of Fig. 2. The left-hand side 
L which indicates how the merging is to be performed, is marked in grey. 

The unfolding of a given graph grammar is usually infinite and thus its con- 
struction could continue for an arbitrarily long time. To avoid this we employ 
two mechanisms, called depth restriction and width restriction. 

Depth restriction: The idea consists of truncating the unfolding after a certain 
depth has been reached. To make this more formal, define the depth of a tran- 
sition t to be the length of the longest sequence to <n ti <n ■ ■ • <n tn <n t, 
where <at is the causality relation. The depth of an edge is the depth of the 
unique transition having such an edge in its post-set. If the edge is in the start 
graph, then its depth is set to 0. For our purposes it is not necessary to define the 
depth of a node. Then we fix a parameter k , called depth restriction, asking that 
no items of depth greater than k are ever created by the unfolding construction. 

Width restriction: Depth restriction is not sufficient to keep the unfolding finite, 
since matches of a left-hand side could be unfolded more than once. To stop after 
a finite number of steps we impose the following conditions: 

(1) A rule r which deletes at least one edge, i.e., for which the left-hand side is 
strictly larger than the context, is applied only once to every match. Note that 
we would not gain anything from unfolding such a match twice: since at least 
one token is consumed by firing the corresponding transition and the unfolding 
is acyclic, the pre-set could never be covered again and thus it would not be 
possible to fire another copy of the same transition. 

(2) A rule r which does not delete any edge, i.e., for which the left-hand side 
is equal to the context, is unfolded w times for every match, where w is a fixed 
parameter called width restriction. The different copies of this transition can 
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potentially all be fired since no edge is ever consumed. Actually, since each copy 
of the transition has an empty pre-set it could possibly be fired more than once, 
leading to more than one token in a place, a situation which must be avoided in 
the unfolding where each transition is intended to represent a single occurrence 
of firing and each place a single occurrence of a token. This problem is solved by 
introducing a dummy place - initially marked - as the pre-set of such transitions, 
ensuring that every transition is fired only once. 

Generating grammar before optimising grammar: We still have to avoid mixing 
the two grammars. So far it is still possible to create an unfolding that includes 
sequences of rewriting steps where rules of the generating grammar are applied 
to the start graph, followed by the application of rules of the optimising grammar 
and then again by rules of the generating grammar. Derivations of this kind do 
not model any interesting situation: in practice, first the model is created, and 
only then are optimising steps allowed. Hence we impose that whenever there 
are transitions t± and t -2 such that t\ <n t -2 and pjv(^ 2 ) is a rule of the generating 
grammar, then also pjv(ii) must be a rule of the generating grammar. In such a 
situation we say that <at is compatible with the grammar ordering. 

Add inhibitor arcs: In the next step, the final unfolding is obtained by taking 
every transition t in the Petri graph, labelled by a rule r, considering the corre- 
sponding match and adding, for any falsifying edge, an inhibitor arc. Inhibitor 
arcs are represented by dotted lines with a small circle at one end. 

Initial marking: The initial marking contains exactly the edges of the start graph 
and, in addition, all dummy places that were created during the unfolding. 

The structure produced by the above procedure is referred to as unfolding 
up to depth k and width w and denoted by Ujf. 

Example (continued) : Fig. 6 shows a part of the unfolding for the grammars 
of the running example. We assume that the depth restriction k is at least 3, 
the width restriction is at least 2 and the arithmetic depth a is also at least 
2. Table 1 shows the labelling of transitions over rewriting rules and the causal 
depth of each transition. The depth of every dummy place is 0, while the depth 
of any other place (edge) is the depth of the transition which has this edge in 
its post-set. Note that two inhibitor arcs at transitions fio and in in Fig. 6 are 
inserted because of the presence of falsifying edges. 

The unfolding faithfully represents system behaviour in the following sense. 

Proposition 1. Let Go be the start, graph of the generating grammar and let 
Go =4>* G be a derivation of G such that: (i) the derivation consists of at most k 
(possibly concurrent) steps, (ii) no rewriting rule is applied more than w times to 
the same match, (Hi) rules of the optimising grammar are applied only after those 
of the generating grammar. Then there is a reachable marking m in the unfolding 
truncated at depth k and width w, such that G is isomorphic to graph(m ) up to 
isolated nodes. Furthermore, for every reachable marking m there is a graph G 
such that Go =>* G and G is isomorphic to graph(m), up to isolated nodes. 
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Fig. 6. A part of the unfolding for the example grammars (constant folding). 
Table 1 . Correspondence between transitions and rules. 
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(a) Grammar Q g (b) Grammar Q a 

6 Generating Test Cases 

The application order of optimisation techniques is not fixed a priori, but de- 
pends on specific situations within the input graph. Hence, with respect to test- 
ing, the situation is quite different from that of imperative programs for which 
there exists a widely accepted notion of coverage. In order to achieve in our case 
an adequate coverage for possible optimisation applications (of a single optimi- 
sation rule or a combination of different optimisation techniques) we propose to 
derive (graphical) input models which trigger the application of a single opti- 
misation step or trigger the “combination” of several rules. In the last case the 
occurrences of the selected rules should be causally dependent on each other or 
in asymmetric conflict, such that error-prone interactions of rules can be tested. 

Test cases triggering such a behaviour can be derived from the unfolding (up 
to depth k and up to width w) which provides a very compact description of all 
graphs which can be reached and of all rules which can be applied in a certain 
number of steps (see Proposition 1). 
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Fig. 7. Schematical representation of test case generation. 



In the following, we denote by lZ g the set of rules of the generating grammar, 
by G s its start graph and by 1Z 0 the set of optimising rules. 

Definition 7 (Test case). Given a set of optimising rules R C TZ a , a test case 
for R is an input model G such that a computation of the optimiser over G can 
use all the rules in R. 

Take a set R of interesting rules the interaction of which should be tested. 
The set R can be determined by the tester or by general principles, for instance 
one could take all sets R up to a certain size. Then proceed as follows: 

(1) Take a set T of transitions in Uff (the unfolding up to depth k and up 
to width w) labelled by rules in R such that (a) for all r £ U, there exists a 
transition t r £ T such that t r is labelled by r; (b) for all transitions t r £ T there 
exists a transition t' r £ T which is related to t r by asymmetric conflict or causal 
dependency. 

(2) Look for a set T' of transitions in Uf such that T C. T', and exactly the 
transitions of T' can be fired in a derivation of the grammar. Note that not every 
set T can be extended to such a T' , since transitions might be in conflict or block 
each other by inhibitor arcs. 

(3) Take the subset of transitions in T' labelled by rules in lZ g and fire such 
rules, obtaining a marking m. Then graph(m) is a test case for R. 

See Fig. 7 for a schematical representation of the above procedure. Whenever 
the specification is non-deterministic, we cannot guarantee that the execution 
over the test case really involves the transformation rules in R, but this is a 
problem inherent to the testing of non-deterministic systems. 

The set T' can be concretely defined by resorting to the notions of con- 
figuration and history in the theory of inhibitor Petri nets [7,3]. Roughly, a 
configuration of Uff is a pair (C, <c) where C is a set of transitions closed under 
causality and <c is a partial order including causality <u , asymmetric con- 
flict /^u and a relation < p which considers the effects of inhibitor arcs: For any 
place s connected to a transition t £ C by means of an inhibitor arc (s £ t) it 
chooses if t is executed before the place is filled or after the place is emptied. 
A configuration (C, <c) can be seen as a concurrent computation, <c being a 
computational ordering on transitions, in the sense that the transitions in C can 
be fired in any total order compatible with <c- A configuration (C, <c) is called 
proper if the partial order <c is compatible with the grammar ordering, i.e. , if 
t\ <c ^2 and <2 belongs to the generating grammar then also t\ belongs to the 
generating grammar. 
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The history of a transition t in a configuration (C, <c), denoted by C[t] is 
the set of transitions which must precede t in any computation represented by 
C. Formally, C[t\ = {t' £ C \t' <c t}. Note that a transition t can have several 
possible histories in different configurations. This is caused by the presence of 
read arcs and, even more severely, by inhibitor arcs. With asymmetric conflict 
only, there is a least history, the set of (proper) causes |_fj = {t' \ t' <n t}, 
and the history of an event in a given configuration is completely determined by 
the configuration itself. With inhibitor arcs, in general, there might be several 
histories of a transition in a given configuration, and even several minimal ones. 

Hence, coming back to the problem in step ( 2 ) above, the set T' we are looking 
for can be defined as a proper configuration including T. The choice among the 
possible configurations T' including T could be influenced by the actual needs 
of the tester. In many cases the obvious choice will be to privilege configurations 
with minimal cardinality, since these contain only the events which are strictly 
necessary to make the rules in T applicable. 

Example (continued): We continue with our running example. Assume that 

we want a test case including the application of rule (ConstantFoldingSum), 
i.e., R = {(ConstantFoldingSum)} (the procedure works in the same way for 
more than one rule). Transition t u is an instance of this rule and it is con- 
tained in several different configurations, for example H\ = {t^, t 3 , t^, t§, t&, tu} 
which creates only a sum with two integers and corresponding connections, 
#2 = {ti,t2,t3,t4 : ,t5,te,t7,ts,tii} which creates another .E-edge and removes 
it by (ConstantSplitting) and H 3 = {t\, t2, t 3 , t^, t$, tg, ty, t 9 , Ho, Hi} which cre- 
ates another E-edge and another sum and removes them by (KillUselessFunction) 
and (KillLonelyEdge). All these configurations are histories of tu- Depending on 
the choice of the history, one obtains the two different test cases (the first for 
H i and the other for H2 and H 3 ) in Fig. 8. 

To understand how such a test case is derived, consider the history H 3 . 
Causality (<u), asymmetric conflict (/*u) and the relation < p in H 3 are de- 
picted in Fig. 9 . Note, for instance, that t-j is forced to fire before tu, since <7 
corresponds to a rule of grammar Q g , while tu is labelled with a rule of Q a . 
After firing ty, transition tu is blocked by an inhibitor arc, and tu can only be 
enabled by firing t 9 and Co- Thus f 9 < p tio < p tu is the only possible choice. 
Furthermore 1 7 and t 9 are in asymmetric conflict (£7 /V £9), since < 9 removes 
an element of the context of £7. By taking the subset of rules in H 3 which belong 
to the generating grammar (y g , namely ti,t2,t 3 ,t/i,t3,t^,t7, and firing them, we 
obtain the test case on the right-hand side of Fig. 8. 



7 Conclusion 

We have presented a technique for deriving test cases for code generators with a 
graphical source language. The technique is based on the formalisation of code 
generators by means of graph transformation rules and on the use of (variants 
of the) unfolding semantics as a compact description of their behaviour. 
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Fig. 9. Causality, asymmetric conflict and relation < p . 



The novelty of our approach consists in the fact that we consider graphical 
models and that we generate a compact description of system behaviour from 
which we can systematically derive test cases triggering specific behaviours. By 
using an unfolding technique we can avoid considering all interleavings of con- 
current events, thereby preventing combinatorial explosion to a large extent. We 
believe that this technique can also be very useful for testing programs of visual 
programming languages. 

This paper does not address efficiency issues. Note that the causality rela- 
tion and asymmetric conflict of an occurrence net can be computed statically 
without firing the net. Hence, it is important to note that without inhibitor 
arcs, configurations and histories and hence test cases can be determined in a 
very efficient way. In the presence of inhibitor arcs, it is necessary to construct 
suitable relations < pi leading to configurations. Obtaining such relations < p is 
quite involved and requires efficient heuristics, which we have already started 
to develop in view of an upcoming implementation of the test case generation 
procedure. 
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Abstract. To formalize, measure, and predict availability properties, 
stochastic concepts are required. Reconfiguration and communication in 
mobile and distributed environments, where due to the high volatility of 
network connections reasoning on such properties is most important, is 
best described by graph transformation systems. 

Consequently, in this paper we introduce stochastic graph transformation 
systems, following the outline of stochastic Petri nets. Besides the basic 
definition and a motivating example, we discuss the analysis of properties 
expressed in continuous stochastic logic including an experimental tool 
chain. 



1 Introduction 

Non-functional requirements concerning the availability of a system, measured 
in terms of mean time between failures, time to repair, or the average or maximal 
answer time , play an increasingly important role in mainstream software devel- 
opment. This is largely due to the change of focus from applications running 
on single machines or reliable local-area networks to Web-based distributed and 
mobile applications, where connections may be broken or varying in quality, or 
servers may be temporarily down. 

Individual occurrences of failures are generally unpredictable. Therefore, 
stochastic concepts are required to formalize, measure, and predict availabil- 
ity properties. Specification formalisms providing suitable stochastic extensions 
include, for example, transition systems (i.e., Markov chains [2,19]), stochastic 
Petri nets [18,5] or process algebras [6,10]. In order to meet the limitations of 
available analysis techniques, these formalisms mostly abstract from functional 
and architectural aspects like application data, changes in the network topology, 
etc. 

However, even simple mobile devices today, like cell phones or PDAs, are 
equipped with communication and computation power beyond that of stationary 
computers a few years ago. In order to manage the resulting logic complexity of 
applications, high-level models of the functionality of the systems are required. 

* Research funded in part by Deutsche Forschungsgemeinschaft, grant DO 263/8-1 [Al- 
gebraische Eigenschaften stochastischer Relationen] and by European Community’s 
Human Potential Programme under contract HPRN-CT-2002-00275, [SegraVis]. 
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This calls for specification techniques which are able to integrate functional and 
architectural aspects with non- functional (stochastic) requirements. 

Therefore, with the approach to stochastic graph transformation presented 
in this paper, we deliberately do not limit ourselves to models that are easy to 
analyze. We are convinced that, first of all, a problem-oriented style of modeling 
is required, because models have to be understood and validated by humans 
before they can be subject to automated analysis. 

The paper is structured as follows. After discussing related work and outlining 
our approach, in Sect. 3 we present the basic definitions of typed graph trans- 
formation systems along with a simple example of a wireless network. Sect. 4 is 
devoted to definition of Markov Chains and Continuous Stochastic Logic (CSL). 
In Sect. 5 we make use of stochastic concepts by associating rates of exponential 
probability distributions with the rules of a graph transformation system and 
describing the Markov chain generated from it. Sect. 6 concludes the paper with 
an discussion of tools and relevant theoretical problems. 

2 Related Work 

The presented approach inherits from two lines of research: stochastic modeling 
and analysis and graph transformation for mobility. We discuss both of them in 
turn. 

Stochastic Modeling and Analysis. The underlying model for stochastic analysis 
is provided by Markov chains , i.e., transition systems labeled with probability 
distributions on transitions [2,19,4]. Stochastic Petri nets provide a convenient 
method of describing Markov chains. The reachability graph of the net provides 
the underlying transition system, and its state transitions are decorated with the 
probabilities of the net transitions from which they are generated [5] . A similar 
idea lies behind stochastic process algebras where process algebras like CCS or 
the 7r-calculus are used to describe the transition system [6]. 

Generalizing from these examples, the idea of stochastic modeling can be 
phrased as follows. A state-based formalism is used to specify the desired be- 
havior. From suitable annotations in the specification, probability distributions 
for the transitions of the generated transition systems are derived which provide 
the input to stochastic analysis techniques. 

Our approach follows the same strategy, replacing Petri nets or process al- 
gebra with graph transformation systems. In this way, we obtain a high-level 
formalism in which both functional and non-functional aspects of mobile sys- 
tems can be adequately specified. 

Graph Transformation for Mobility. Graph transformation systems have been 
used for describing the semantics of languages for mobility, like the Ambient 
calculus [13] as well as corresponding extensions of the UML [3]. However, we 
will be more interested in direct applications to the modeling of mobile and 
distributed systems. 
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Generally, we may distinguish approaches which take a strictly local perspec- 
tive, modeling a distributed system from the point of view of a single node, e.g., 
[16,9] from those taking a global point of view, e.g., [24,14]. In the latter case, 
each rule specifies the preconditions and effects of a potentially complex protocol 
with multiple participants, rather than a single operation as in the former case. 

Our approach follows the second, global style of specification, with the run- 
ning example derived from [14]. However, we think that the same combination 
of graph transformation with stochastic concepts could be applied to the more 
local style of specification. 

3 Typed Graph Transformation 

In this section, we provide the basic notions of typed graphs and their trans- 
formation according to the so-called algebraic or double-pushout (DPO) ap- 
proach [12,8]. 

Typed Graphs. By a graph we mean a directed unlabeled graph G = 
( Gv,Ge, src G ,tar G ) with a set of vertices Gy, a set of edges Ge, and func- 
tions src G : Ge — > Gy and tar G : Ge — > Gy associating to each edge its source 
and target vertex. A graph homomorphism / : G — » H is a pair of functions 
(fy : Gy —> Hy , Je : Ge — > He) preserving source and target. 

In this paper, vertices shall represent hardware components, with geographic 
relations, signals, and communication links as edges. The class of all admissible 
configurations is defined by means of a type graph , i.e., a graph specifying the 
available types of components and connections. Fixing a type graph TG , an 
instance graph ( G,g ) over TG is a graph G equipped with an attributed graph 
morphism g : G — > TG. A morphism of typed graphs h : ( Gi,gi ) — » ( G 2 , < 72 ) is a 
graph homomorphism h : G\ — > Gn that preserves the typing, that is, g 2 °h = g\. 

Example 1 (mobile system: types and configurations) . Figure 1 on the left shows 
the type graph of our running example, a nomadic wireless network like a mobile 
phone network or wireless LAN. Stations linked by a geographic neighborhood 
relation form the static part of the network. To communicate with mobile de- 
vices, signals are broadcast and connections may be established. Stations can be 
broken, indicated by a Boolean attribute ok : Bool with value ok = false , oth- 
erwise ok = true. (For simplicity, we have restricted our presentation to typed 
graphs, disregarding attributes. However, attributes of a finite data type, like 
Boolean, can be encoded in the graphical structure.) The idea is that a Device 
receiving the signal of a Station (because it is in range and the station is not 
broken) may establish a connection. 

On the right, Fig. 1 shows a sample instance graph over the type graph, a 
state with two stations and two devices with the right device receiving the signal 
of and maintaining a connection with the right-hand side station. 

Graph Transformation. The DPO approach to graph transformation has been 
developed for vertex- and edge-labeled graphs in [12] and extended to typed 
graphs in [8]. 




Stochastic Graph Transformation Systems 



213 



TG 




G 





Fig. 1 . Type graph TG and sample instance graph ( G,g ). 



Given a type graph TG, a TG-typed graph transformation rule is a span of 

l r 

injective TG-typed graph morphisms p = (L < — K — > R), called a rule span. 
The left-hand side L contains the items that must be present for an application 
of the rule, the right-hand side R those that are present afterwards, and the 
gluing graph K specifies the “gluing items”, i.e. , the objects which are read 
during application, but are not consumed. 

A direct transformation G ==^ H is given by a douhle-pushout (DPO) dia- 
gram o = ( ol,ok , or) as shown below, where (1), (2) are pushouts and top and 
bottom are rule spans. 

L ^ — K — ^ R 



OL 



( 1 ) 



OK (2) 



OR 



G -1T D ^ H 

If we are not interested in the rule and diagram of the transformation we will 
write G ==>■ H or just G => H. We will also identify the transformation step 
(i.e., the DPO diagram) with the label of the arrow, like in t = G H. 

The DPO diagram o is a categorical way of representing the occurrence of 
a rule in a bigger context. Operationally, it formalizes the replacement of a 
subgraph in a graph by two gluing diagrams, called pushouts. The left-hand 
side pushout (1) is responsible for removing the occurrence of L \ l(K) in G, 
resulting in graph D. The right-hand side pushout (2) adds a copy of R \ r(K) 
to D leading to the derived graph H. 

In general, we will not be interested in representing the intermediate graph 
K of a rule separately. We will thus assume that it forms a subgraph of both L 
and R such that K = L 0 R, denoting this rule by p : L => R. If they are clear 
from the context, we will drop the indices L, K, R of the occurrence morphisms. 

A graph transformation system consists of a type graph and a set of rules 
which can, in general, be infinite. This can result in an infinite number steps 
outgoing from a single graph. To avoid this, we will have to make sure that only 
a finite number of rules is applicable to each finite graph. 

A related problem is a consequence of the categorical formalization which 
defines the derived graph only up to isomorphism. Indeed, for a given rule and 
occurrence there may exist an infinite (even uncountable) number of results, all 
isomorphic copies of each other. This is, of course, a disaster for state space 
analysis. 
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It has been pointed out [7] that the naive solution of considering arbitrary 
isomorphism classes is unsatisfactory because the history of vertices and edges 
becomes confused if there is more than one isomorphism between two graphs. 
The solution proposed in [7] is based on identifying a canonical isomorphism for 
each pair of isomorphic graphs in order to establish a notion of identity across 
different graphs. Instead, we will explicitly remember the elements of interest by 
extending graphs with variable assignments. 

Variable Assignments and Parameterized Rules. The introduction of assign- 
ments prepares the ground for interpreting (stochastic) temporal logic over graph 
transformation systems. Moreover, in order to derive a transition system with 
meaningful labels, we introduce rules with formal parameters. 

Definition 1 (graph transformation with assignments). A graph trans- 
formation system (with parameterized rules) Q = (TG,P,ir) consists of 

— a type graph TG 

— a set of rule names P 

— a mapping tt associating with every rule name p a TG-typed rule span s = 

l r 

( L < — K — » R) and a list of formal parameters e \, . . . , e n £ L U R, i.e. 

t r(p) = (ei ...e n ,s) 

In this case, we say that p(e i . . . e n ) : s is a rule of Q. 

Fixing a countable set of variables X , an assignment in a graph G is a partial 
mapping ac '■ X — * Gy + Ge into the disjoint union of G's vertices and edges. 
The transformation of graphs with assignments is defined as follows. Given 

l r 

a graph with assignment ( G , ac) and a rule p(e i . . . e n ) : L < — K — > R, 

there exists a transformation ( G,aG ) ^ =£ ( H,an ) with actual parameters 

x\, . . . , x n £ X , whenever 



L - K R 



OK 


..... X 


' ....--""OG ’ 


CLH 'A ' 






(i) a transformation from G to H via p can be constructed, represented by the 
double-pushout. diagram above; 

(ii) assignments aa,an are compatible with the bottom span of the transfor- 
mation, i.e., there exists an assignment an : X — > D such that go an Q ac 
and ho an Q a// 1 ; 

(Hi) occurrences ol,or are compatible with assignments ac,aH, i.e.., for all 
1 < i < n: et £ L implies OLief) = ac(xi) and ei £ R implies on(ei) = 
a H {xi) 2 . 

1 By / C /' we mean that /' is defined whenever / is, and in these cases they coincide. 

2 Here we mean strong equality, i.e., both sides of the equation need to be defined 
(and, of course, equal). 
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A transformation sequence with assignments in Q 



(G 0 , ao 0 



piOi 






(Gi, OGj) 



P2(x21,-..,X2n 2 ) Pk(Xkl 



(' Gk,a Ck 



is a sequence of consecutive transformation steps with pt £ P, briefly denoted 
by (Go,aG 0 ) =>g {GkiO>G k ) such that variables are single-assignment, i.e., if 
Xij = Xi'ji for i < i' , then the first is an output and the second is an input 
parameter and ac i+1 {xij) is related to aG^ixi'j') v ^ a bottom spans of the 
transformations from G^+i to Gp . 



In a transformation rule p(e i . . . e n ) : s, the e* play the role of abstract 
parameters, i.e., input parameters if e* € L and output parameters if e* £ R. 
Variables in X represent references to graph elements that are used in expressions 
like p(a:i . . . x n ) to denote a step from G to H where rule p is applied at an 
occurrence which maps each e* £ L to acixf) £ G and each e* £ R to an(xi) £ H 
(cf. condition (iii) above). Thus, the ay are logical variables used to represent 
the concrete counterparts of the abstract parameters e,. 

Condition (ii) states that assignments are stable for all elements that are pre- 
served by the transformation. The last condition over transformation sequences 
ensures that a repeated occurrence of a variable in a label denotes the same 
graphical object at different points in time. 

Now we are ready to define the labeled transition system induced by a graph 
transformation systems. The labels shall be given by rule names with actual 
parameters. The state space is not built from concrete graph, but isomorphism 
classes of graphs with assignments 



[(G, ac)\ = {(H, oh) | 3 isomorphism i:G-^H such that i o ac = an}- 

Definition 2 (induced labeled transition system). Let Q be a graph trans- 
formation system, X be a fixed set of variables, and ( Go,ac 0 } be an initial graph 
with assignment. The transformations in Q create a labeled transition system 
LTS{Q,X, (Go,aG 0 )) = (L,S,=>), the induced labeled transition system, where 

— L = {p(xi , . . . , x n )\p £ P A x \, . . . , x n £ X} is the set of rule names with 
actual parameters from X 

— S = {[(G n , ug„)]| (Go, aa 0 ) (G n ,aG n )} is the set of isomorphism 

classes of graphs with assignments reachable from (Go,aG 0 ), 

— =^C S x L x S is the transformation relation on graphs with assignments, 
lifted to isomorphism classes, i.e., 

[(G,o g )] P(x ^ Xn) [(H,a H )} iff (Go, a Go ) =>*g ( G,a G ) p(x ^ n) (H, a H ). 

Note that the last item implies that all sequences in the transition system 
from the initial state satisfy the single-assignment condition implicit in the notion 
of transformation sequences. 
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s:Station 


fail(s) 


s:Station 




s:Station 


repair(s) 


s:Station 


ok = true 




ok = false 




ok = false 




ok = true 



Fig. 2. Failure and repair rules. 



Example 2 ( mobile system: rules and transformations). Graph transformation 
rules model the failure and recovery of components, their movement, the estab- 
lishment and loss of connections, etc. Failure and repair of stations are expressed 
by the rules in Fig. 2. 

Devices may move into and out of the sending range of stations, thus loosing 
and regaining the signal as specified in Fig. 3 in the top. A negative applica- 
tion condition, shown as a crossed out signal edge [15], ensures that there is at 
most one signal edge between a station and a device. More often than moving 
out of range entirely, devices should move between cells covered by neighboring 
stations, as shown in the bottom left of the figure. Finally, when a station is 
broken, its signal is lost as shown in Fig. 3 on the right. 




Fig. 3. Movement of devices and loosing the signal due to station failure. 



If a device has a signal and no connection, a connection may be established 
as shown in Fig. 4 on the left. If a device does not have a signal of the station it 
is connected to, the connection is lost, too, cf. Fig. 4 on the right. 




Fig. 4. Establishing a connection and loosing it due to loss of a signal. 
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In order to ensure continuous connectivity, many networks provide handover 
protocols. One possible behavior is specified in Fig. 5: If a device does not have 
a signal of the station it was connected to, but that of a neighboring station, the 
connection may be handed over to the second station. 




Fig. 5. Handover between two stations. 



Figure 6 shows an application of rule connect(s,d) with corresponding actual 
parameters x, y. (As for rules, we skip the intermediate graph D in the sample 
transformation.) In accordance with the occurrences, the assignments map x to 
S 2 and y to D2. Graph properties can be expressed by rules with equal left-and 
right hand sides, whose applicability appears as a loop in the transition system. 
Figure 6 shows the property rule con(d) with L = R := P indicating whether the 
device matching the formal parameter d is connected. In graph G this pattern 
does not find an occurrence. In graph H the property is satisfied if we instantiate 
d with y (referencing D 2), but not for x (referencing D 1). 



0 / W Or? w 




connect(s,d) 




, = {s->S2, 
d ->D2} 



V 



a G = a H = {x S2, y D2} 



V 



neig h bor 



SI :Station 




S2:Station 


ok=false 




ok=true 


signal 




D1:Device 




D2:Device 



connectfx, y) 

(T^> 

G H 

con(d): 



neig h bor 



SI :Station 




S2:Station 






ok=false 


ok=true 




cted 


signal. 


, conne 


D1:Device 




D2:Device 





s2:Station 



connected 



d2:Device 



con(y) 

' P y = {s2 ->S2, d2 -1D2} 



Fig. 6. Transformation step and state property with variable assignment. 
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4 Markov Chains and Stochastic Logic 

After having described how to model mobility with graph transformation sys- 
tems, and how based on this model non- functional requirements can be ex- 
pressed, in this section we define Continuous Time Markov Chains (CTMCs) and 
explain how they can be analyzed. Furthermore, Continuous Stochastic Logic 
CSL is introduced to make assertions on CTMCs. 

Markov Chains. First we provide some basic notions adopting the Q-matrix, a 
kind of “incidence matrix” of the Markov Chain, as elementary notion (cf. [19]). 

Definition 3 (Q-matrix). Let S be a countable set. A Q-matrix on S is a 
real-valued matrix Q = Q(s, s') StS > e s satisfying the following conditions: 

(i) 0 < — Q(s,s) < oo for all s £ S, 

(ii) Q(s, s') > 0 for all s ^ s' , 

(Hi) J2 s 'es Q( s > s/ ) = 0 f or s £ S. 

The Q-matrix is also called transition rate matrix. Some authors (e.g. [2]) 
use a more general notion of Q-matrix and call the matrices defined above stable 
and conservative. The most important notion is captured in the following 

Definition 4 (CTMC). A (homogeneous) Continuous- Time Markov Chain is 
a pair (S, Q) where S is a countable set of states and Q is a Q-matrix on S. 

If s ^ s' and Q(s,s') > 0, then there is a transition from s to s'. The 
transition delay is exponentially distributed with rate Q(s , s'). Consequently, the 
probability that, being in s, the transition s — > s' can be triggered within a time 
interval of length t is 1 — e _< ^ s ’ s ' >t . The total exit rate Q(s) = —Q(s, s) specifies 
the rate of leaving a state s to any other state. If the set {s' | Q(s, s') > 0} is 
not a singleton, then there is a competition between the transitions originating 
in s. The probability that transition s — > s' wins the ‘race’ is ([2], §1.2, Prop. 
2 . 8 ) 

Q(s,s') 

Q(s) ■ 




The transition probability matrix P(t) = (P ss > (t)) s s , gS describes the dy- 
namic behavior of a CTMC. It is the minimal non-negative solution of the equa- 
tion 



P’(t) = QP(t ), P(0) = I. 
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The (s, .s'j-inclexecl entry of P(t) specifies the probability that the system is 
in state s' after time t if it is in state s at present. Given an initial distribution 
7r(0), the transient solution ir(t) = ( 7 r s (f)) sgS is then 

7r(f) = 7r(0)P(t). 

In the finite case, P(t) can be computed by the matrix exponential func- 
tion, P(t) = e®*, but the numerical behavior of the matrix exponential series is 
rather unsatisfactory [23] . Apart from the transient solution, which specifies the 
behavior as time evolves, the steady state or invariant distribution is of great 
interest. 

Definition 5 (invariant distribution). A map 7r : S — » [0, 1] is an invariant 
distribution if 

ttQ = 0 

Y = L 

sG5 



In the example system shown in figure 7, the vector 7r = 
invariant distribution and the transient solution is given by 
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Note that each row of P(t) converges to 7r as t — > 00 , so the invariant distribution 
is always approached in the long term and is independent of the initial distribu- 
tion, a property which can only be assured under certain reachability conditions. 
We say that a state s' can be reached from s, and write s — 1 s' , if there are states 
s = so, ■ ■ ■ ,s n = s', such that Q(so,si) • Q(si, s 2 ) • . . . • Q(s n _i, s n ) > 0. If s — ^ s' 
and s' s, then we say that s and s' communicate, and write s ^ s'. 



Definition 6 (irreducible Q-matrix). A Q-matrix Q is irreducible if s ^ s' 
for all s, s' £ S. 



Non-irreducible matrices can be partitioned into their irreducible components 
for analysis. We will therefore consider only irreducible matrices. It can be shown 
that the first condition of Definition 5 is equivalent to irP = 7r if Q is finite and 
irreducible. The entries of the transition probability matrix then converge to the 
steady state, 

lim 7T SS / (f ) = 7T S / . 
t—*o O 

In the infinite case, stronger assumptions are necessary (positive recurrence, [19], 
Th. 3.5.5 and 3.6.2). 
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A Stochastic Temporal Logic. We use extended Continuous Stochastic Logic 
CSL as presented in [4] to describe properties of CTMCs. Suppose that a la- 
belling function L : S —> 2 AP is given, associating to every state s the set of 
atomic propositions L(s) C AP that are valid in s. The syntax of CSL is: 

#::=« | a | -.# | <PiA«P 2 | S< P (&) I V< P ($i U 1 ^) 

where < £ {<,>}, p £ [0,1], a £ AP and I C R is an interval. The other 
boolean connectives are defined as usual, i.e., // = ->tt, <P\/L r = and 

<L> — » P = The steady-state operator S <P {<L) asserts that the steady-state 

probability of the formula meets the bound < p. The operator V« P ($i u 1 ^) 
asserts that the probability measure of the paths satisfying <1>iU i <I>2 meets the 
bound <p. 3 

A path through a CTMC M is an alternating sequence cr = sofoSih • • • with 
Q(sj,Sj+ 1) > 0. The time stamps t, denote the sojourn time in state S{. Let 
Path M be the set of all paths through M. Then, Prob M is defined as Pr s {a £ 
Path M | cr |= <Pi U I <p 2 }, where Pr s denotes the probability measure on sets of 
paths that start in s, as defined in [4] and T>iU I <L >2 asserts that @2 will be satisfied 
at some time instant in the interval / and that at all preceding time instants <2>i 
holds. We define the semantics of CSL as follows: 

s \= tt AA s £ S s [= (/>\ A (j >2 AA s \= (j)i and s |= (j >2 

s b a AA a £ L{s) s |= S <p ((j)) <G> Yl s \=<t> 

s \= -<(f> <:=>> s cj) s |= V^p^xU 1 02 ) "G* Prob M (s, tyiU 1 $ 2 ) <p 



CSL Example. Consider the CTMS of Fig. 7 with initial distribution 7r(0) = 
(1, 0, 0). Define an atomic proposition a to be true in states 1 and 2, i.e. L(l) = 
L( 2) = {a}, L( 3) = 0. Then the formula £>0.5(0) is true, because in the steady- 
state, a is fulfilled with probability 0.5, whereas "P>o.9 (a W^ 0 ’ 1 ] ->a) is false, as 
the probability is only 0.86. 



5 Stochastic Graph Transformation Systems 

A stochastic graph transformation system associates with each rule name a pos- 
itive real number representing the rate of the exponentially distributed delay of 
its application. 

Definition 7 (stochastic GTS). A stochastic graph transformation system 
SQ = (TG, P,tt, p) consists of a graph transformation system (TG,P, if) and a 
function p : P — > R + associating with every rule its application rate p(p) . 

Example 3 (mobile system: application rates). The application rates for the rules 
of our mobility example are shown in the following table. 

3 The other path and state operators can be derived. Details are given in [4]. 
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rule name p rate p{p) rule name p rate p{p) rule name p rate p{p) 



repair 

moveln 



fail 

moveOut 



connect 

disconnect 



looseSig 100000 lrandOver 



For convenience, we only use integer values for rates. We assume that the 
average time needed to repair a broken host is much smaller than the average 
time until the next failure (the expected value of the application delay is just 
the inverse of the rate). If one likes to interpret the proposed values with the 
unit per day , this means that in the mean term, an error occurs once a day and 
can be repaired in approximately three minutes. 

Moving in and out are assumed to happen equally often, but much more 
scarcely than moving into another cell. When a station fails, the signal is lost 
almost immediately, so the rate is very high. Connecting, and disconnecting 
when the signal is lost, are assumed to happen equally fast. Finally, handover 
has to be possible within a few seconds to guarantee stability of connection. 



Exponential distributions are single-parameter distributions that have a wide 
range of applications in analyzing the reliability and availability of electronic 
systems. Modeling a component’s reliability with an exponential distribution 
presupposes that the failure rate is constant, which is generally true for electronic 
components during the main portion of their useful life. This means that the life 
of a component is independent of its current age (the memoryless property). 

User mobility and connection duration can also be modeled as exponentially 
distributed, see [11,26]. Of course, more detailed and realistic models are not 
confined to this approach but use other stochastic techniques, too, in order to 
take into account aspects like speed or direction [27]. 



From Stochastic GTS to Markov Chains. We now show how a stochastic graph 
transformation system gives rise to a Markov Chain, so that the analysis tech- 
niques described in Sect. 4 can be applied. First, we need an important notion. 

Definition 8 (finitely-braching). Let LTS = (L, S, =>) a labeled transition 
system. Let R(s, s') := {p £ P | 3s' : s s'} be the set of all transitions 
between s and s'. LTS is called finitely-branching iff R(s, S), the set of all rules 
applicable to s, is finite for all s £ S. 

We are now ready for the main result: 

Proposition 1 (and Definition: induced Markov chain). Let SQ = 

(TG, P, 7r, p) be a stochastic graph transformation system with start graph 
(Go,aGo) and let the induced labeled transition system LTS(Q,Gq) = (L, S, =>) 
be finitely-branching. Assume for all s £ S that p(p) = 0 if p £ R(s , s). We set 4 

[ E pip) for s ^ s '. 

0(s s') = < pefl(s ’ s ') 

, ) ) E Q(s,t) for 8 = s? 

4 We use the convention Y = 0 for the empty sum. 

0 
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Then (S, Q) is a Continuous Time Markov chain, the induced Markov chain 
of SQ. 

Proof. We have to show that Q is a Q-matrix on the set of all graphs reachable 
from the start graph Go by applying the transformation rules, and so (S, Q) is a 
CTMC. The finite-branching property is crucial to ensure that condition (i) of 
Def. 3 is fulfilled, because the sum Y^ P £R( s s i\ p(p) is finite as is its indexing set. 

It also implies that R(s , s') is finite for all s ^ s'. So compliance with (ii) is 
secured by Def. 7 as all p(j>) have to be finite, and part (in) of Def. 3 is fulfilled 
trivially because of the definition of Q(s, s). 

The assumption that p(p) = 0 if p € R(s, s) for all s G S is not needed 
formally, but as we ignore loops, we require them to have infinite delay, so that 
they will never be applied. □ 

If the initial graph is finite, the same holds for all graphs in the system since 
transformations preserve finiteness. In this case, the finite-branching condition 
is ensured if every finite graph has a finite number of applicable rules only. This 
is trivial in the case of a finite set of rules in the system. However, there may by 
occasions where the set of rules is infinite, e.g., when rules with path expressions 
are regarded as a rule schemata expanding to countably infinite sets of rules. In 
this case, the condition may be violated if the given graph contains a circle and 
thus an infinite number of paths. 

The initial distribution 7r(0) is given by 7r s (0) = 1 for s = [(Go,ac; 0 )] and 
7r s (0) = 0 else. As discussed above, for assuring existence of a unique steady state 
solution it is beneficial if the Markov Chain is finite and irreducible, i.e., every 
state is reachable from every other state in the system. This property, which 
can be checked on the graph transition system, is typically for non-deterministic 
models like the one given by our running example. Indeed, this model does not 
specify an individual application with determined behavior, but rather a whole 
class of mobile systems with similar structure and behavior, comparable to an 
architectural style [14]. For the case of infinite systems, analysis is possible under 
certain conditions (positive recurrence). 

Stochastic Logic for Induced Markov Chains. In order to use CSL for analyzing 
stochastic graph transformation systems, we have to define the set of atomic 
propositions AP as well as the labeling function L. 

Definition 9 (stochastic logic over graph transformation systems). As- 
sume a stochastic graph transformation system SQ = (TG, P,n, p), a set of 
variables X, and an initial graph with assignment (Go,og 0 )- We define 

AP = {p(x i, . . -,x n )\p G P,Xi G X} 

as the set of all rule names with actual parameters, and the labeling of states 

L(s ) = {l\s =4- t} 

to be the set of all labels (instantiated rule names) on transitions outgoing from 
a state. 
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Thus, we can reason about the applicability of rules to graph elements refer- 
enced by variables in X, with the special case of property rules whose left- and 
right hand sides are copies of a pattern defining a structural graph property (cf. 
property con(d) in Fig. 6). The transition rates of property rules are set to 0, so 
that they do not affect the Q-matrix. 

In our example, with X = {x,y,z} and assignment ac = {a: e-> D2,y i— > 
Dl,z i— > 51} into graph G in Fig. 1, this enables us to answer the following 
questions. 

— In the long run, is station z broken at most 1% of the time: 5<o.oi z -°k = 
false ? True. 

— Is the overall connectivity of device x : 5>o.s con(y)l Yes, its 0.555. 

— Is the probability of device y being connected before t days from now at 

least 0.9: V 0.9 ( true con(y))? True for, e.g., t > 1.25. 

Tool Support. This interpretation of stochastic logic, although semantically sat- 
isfactory, has the disadvantage of being “non-standard”: The construction of 
states with assignments is not supported by any graph transformation tool. 
However, even when analyzing the small example in this paper, we have found 
that tool support is indispensable in a stochastic setting, much more so than for 
the analysis of standard transition systems. 

A possible way out that worked well in the example is the encoding of as- 
signments into the graphical structure. The idea is to introduce a vertex for 
each logical variable together with an edge pointing to the referenced graph el- 
ement. Extending the rules in a similar way and ensuring by means of types 
that variables are not confused, we obtain a slightly awkward, but semantically 
equivalent encoding of the transition system. 

We have implemented this idea in the GROOVE tool [20] which allows the 
simulation of graph transformation systems and exports the labeled transition 
system generated from it. Given this system and providing in a separate file 
the rule’s application rates, we can generate the Markov Chain in the input 
language of the probabilistic model checker PRISM [17]. Extracting loops via 
property rules from the transition system we also generate the labeling of states 
by atomic propositions. Thus we can verify CSL formulas with instantiated graph 
patterns as atomic propositions against CTMCs generated from stochastic graph 
transformation systems. 

6 Conclusion 

We have proposed an approach for analyzing graph transformation systems with 
stochastic methods. We have shown that under certain conditions, a stochastic 
graph transformation system induces a Continuous Time Markov Chain. This 
opens the door to a wide range of applications in modeling, and to tools and 
numerical methods for analysis. 

We have constructed an experimental tool chain consisting of GROOVE and 
PRISM. Another approach to generate the input to the model checker is followed 
by [25,21]. In that work, a transition system specification is generated directly 
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from the given graph grammar. This approach could allow us to benefit from 
built-in optimizations or on-the-fly techniques, because the construction of the 
transition system is done inside the model checker. It also allows to use different 
modeling tools, like AGG or PROGRES [1, 22], which support attributed graph 
transformation systems. These options are currently under investigation. 

Another line of research is the generalization of the basic theory of graph 
transformation systems concerning independence and critical pairs, concurrency, 
and synchronization to the stochastic case. In particular, the last issue could be 
relevant for a compositional translation of graph transformation systems into 
transition system specifications which are composed of modules combined by 
synchronous parallel composition. 
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Abstract. Model checking is increasingly popular for hardware and, 
more recently, software verification. In this paper we describe two dif- 
ferent approaches to extend the benefits of model checking to systems 
whose behavior is specified by graph transformation systems. One ap- 
proach is to encode the graphs into the fixed state vectors and the trans- 
formation rules into guarded commands that modify these state vectors 
appropriately to enjoy all the benefits of the years of experience incorpo- 
rated in existing model checking tools. The other approach is to simulate 
the graph production rules directly and build the state space directly 
from the resultant graphs and derivations. This avoids the preprocessing 
phase, and makes additional abstraction techniques available to handle 
symmetries and dynamic allocation. 

In this paper we compare these approaches on the basis of three case 
studies elaborated in both of them, and we evaluate the results. Our 
conclusion is that the first approach outperforms the second if the 
dynamic and/or symmetric nature of the problem under analysis is 
limited, while the second shows its superiority for inherently dynamic 
and symmetric problems. 

Keywords: logic properties of graphs and transformations, analysis of 
transformation systems, semantics of visual techniques, model checking 



1 Introduction 

Graph transformation [6, 18] represents a rich line of research in computer sci- 
ence. Recently, a wide range of applications have been found especially in the 
theoretical foundations of diagrammatic specification formalisms such as UML. 
The main advantage of using graph transformation lies in the fact that not only 
the (static) program state of these UML-related models can be stored as graphs, 
but it is quite obvious and natural to define the evolution of these models by 
transformations on those graphs. 
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However, software engineers may implant bugs into the system under design 
even if they use such a high-level and executable specification methodology as 
graph transformation. In this respect, one has to verify automatically and with 
mathematical preciseness that the system model fulfills all its requirements. 

Model checking is one of the few verification techniques that, in some areas of 
computer science, have shown their benefits in practice and have been adopted 
by industry. However, the successes are mainly limited to hardware verification. 
It has been long recognized that software has features that make the problem 
inherently harder. Primary among those features is the dynamic nature of soft- 
ware, which typically relies heavily upon the dynamic allocation and deallocation 
of portions of memory to data structures (the heap) and control flow (the stack) . 

We argue in the paper that the strengths of graph transformation are pre- 
cisely there where the weaknesses of current model checking approaches lie: 
namely, in the description of the dynamic nature of software. We have therefore 
sought to combine the two, by using graph transformations for the specifica- 
tion, and model checking for the verification of systems. This paper describes 
and compares two, quite different approaches towards this goal, namely, Clreck- 
VML [20,24] and GROOVE [13,16], 

The reason we have chosen these approaches that tackle the model checking 
problem for graph transformations for a comparison is twofold: a) they represent 
the two obvious main roads (i.e. to compile graphs into an off-the-shelf tool or 
to write a state space generator for graphs) b) currently, they have the most 
extensive tool support. 

Related work on model checking graph transformations. The theoretical basics 
of verifying graph transformation systems by model checking have been studied 
thoroughly by Heckel et al. in [9] (and subsequent papers). The authors propose 
that graphs can be interpreted as states and rule applications as transitions in 
a transition system, which idea is used in both approaches in the paper. 

A theoretical framework by Baldan et al. [2] aims at analyzing a special 
class of lrypergraplr rewriting systems by a static analysis technique based on 
approximative foldings and unfoldings of a special class of Petri nets. Recently, 
this work has been extended in [1] to provide a precise (McMillan-style) un- 
folding strategy. This is essentially different from both approaches discussed in 
the current paper in that symmetric situations are only identified on a single 
path (thus they are might be investigated several times on different paths). But 
detecting that a certain situation has already been examined on a single path 
can be much cheaper in general compared to total isomorphism checks (as done 
in GROOVE). 

Dotti et al. [5] use object-based graph grammars for modeling object-oriented 
systems and define a translation into SPIN to carry out model checking. The 
main difference (in contrast to Check VML) is that the authors allow a restricted 
structure for graph transformation rules that is tailored to model message calls 
in object-oriented systems. Therefore, CheckVML is more general from a pure 
graph transformation perspective (i.e. any kind of rules are allowed) However, 
the framework of [5] relies on higher-level SPIN/Promela constructs (processes 
and channels), which might result better run-time performance. 
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Structure of the paper. The rest of the paper is structured as follows. Section 2 in- 
troduces the basic concepts of graph transformation systems and model checking 
on a motivating example. Section 3 and 4 provides an overview of the Check- 
VML and the GROOVE approach, respectively. We present the results of three 
case studies in Sec. 5. Finally, Section 6 concludes the paper. 



2 Model Checking Graph Transformation Systems 

2.1 A Motivating Example: The Concurrent Append Problem 

As a motivating running example for the paper, we consider the “Concurrent 
Append” problem for the Java program listed in Fig. 1, which implements an 
append method on a list of cells. Given an integer value x as parameter, the 
program appends a new tail cell to the list if x is not contained in any of the 
existing cells. An example correctness criterion is that the list of cells must not 
contain the same value more than once. However, we allow that different threads 
may access the list concurrently by calling the append method, which might 
result in undesired race conditions without certain assumptions on atomicity in 
case of the Java program below. 



control 




class Cell { 

Cell next; 
int val ; 

void append(int x) { 
if (x == this. val) 
return; 

else if (this. next == null) { 
this. next = new CellO; 
this .next .val = x; 

} else 

this . next . append (x) ; 

> 

> 



Fig. 1. A Java program and its metamodel/type graph. 



In the paper, we model this problem by using typed graphs [3] (or metamodels 
in UML terms) for describing the static structure. For instance, the metamodel 
in Fig. 1 expresses that a node of type Int may be connected to a node of type Cell 
via an edge of type val (that straightforwardly correspond to the Java attribute 
val). Furthermore, a next edge is leading from a cell point to the next cell in the 
list (if there is any). Each invocation of the append method is denoted by an 
Append node where we register the this pointer (which points to a cell), the caller 
invocation (which is another Append node), and the return value (of type Void) by 
edges of corresponding types. Finally, the program counter in each invocation of 
the append method is denoted by a control loop (self-edge). 
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All valid instance graphs (or models in UML terms) that represent specific 
invocations of the append method should comply to this metamodel in a type 
conforming way for both nodes and edges. 

2.2 An Informal Introduction to Graph Transformation 

The dynamic behavior of the recursive append method is captured by graph 
transformation rules. Graph transformation provides a visual, rule and pattern- 
based manipulation of graph models with solid mathematical foundations [6,18]. 

A graph transformation rule r consists of a left-hand side (LHS) and a right- 
hand side (RHS) graph and, potentially, some negative application conditions 
(NAC) which are traditionally denoted by (red) crosses. Informally, the execution 
of a rule on a given host graph G (i) finds a matching of the LHS in G, (ii) checks 
whether the matching can be extended to the matching of NAC (in which case 
the original matching of the LHS is invalid) , (iii) removes all the graph elements 
from G which has an image in the LHS but not in the RHS, and (iv) creates new 
graph elements and embeds them into G to provide an image for rule elements 
that appear only in the RHS but not in the LHS. In other terms, the LHS and 
NAC graphs denote the precondition while the RHS denotes the postcondition 
for rule application. 

In the paper, we use the rule notation of GROOVE (that is very similar to 
the notation used in the Fujaba [12]), which abbreviates the different LHS, RHS 
and NAC rule graphs into a single graph with the following conventions: 

— Reader nodes and edges (i.e. elements that are part of LHS and RHS) are 
shown in solid thin (black) lines 

— Eraser elements (that are part of the LHS but not the RHS) are depicted in 
dashed (blue) lines. 

— Creator elements (that are part of the RHS but not the LHS) are depicted 
in solid thick (green) lines. 

— Embargo elements (from the NAC) are shown in dotted (red) lines. 

A sample graph transformation rule stating how to append a new element to 
the end of the cell list is depicted in Fig. 2 in both the traditional and the 
GROOVE notation 1 . The dynamic behavior of this highly recursive append 
problem is defined by four graph transformation rules (see Figs. 2 and 3). 

Append a New Cell. Rule Append is responsible for appending a new cell 
to the list if the control reaches the last cell (see the negative condition 
inhibiting the existence of a next edge pointing to a Cell) and the value 
stored at this last cell is not equal to the method parameter. Furthermore, 
the append method returns one level up in the recursive call hierarchy as 
simulated by removing the bottom-most Append node and adding a return 
edge. 

1 Note that node identities are not allowed in the GROOVE tool, we only use them 
for presentation reasons. 
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Fig. 2. The Append graph transformation rule in different notations. 



Go to Next Cell. Rule Next checks whether the method parameter is not 
equal to the value stored at the current cell and makes a recursive call then 
for checking the next cell by generating a new Append node and passing the 
control to it. 

Value Found in List. Rule Found checks if the method parameter is equal to 
the value stored at the current cell and, if so, returns the control to its caller 
append invocation node p in such a case. 





Void 

— f — 



\ :retum 



. ' 



Append < 1 Append ] 

. :caller~~ 

Return 



Fig. 3. Additional graph transformation rules for the concurrent append problem. 
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Return Result. Finally, rule Return simply removes an append invocation node 
(from the stack of recursive calls) if it has already calculated the result. 



2.3 The Model Checking Problem 

of Graph Transformation Systems 

The model checking problem is to automatically decide whether a certain correct- 
ness property holds in a given system by systematically traversing all enabled 
transitions in all states (thus all possible execution paths) of the system. The 
correctness properties are frequently formalized as LTL formulae. 

In graph transformation systems, a state is a graph, while a transition cor- 
responds to the application of a rule for a certain matching of the left hand side 
in such a graph. Traversing all enabled transitions then means applying all rules 
on all possible matchings. During this process, it is important to realize whether 
a certain state has been investigated before; therefore the model checker has to 
store all the graphs that it has encountered. Furthermore, ideally a model checker 
should exploit the symmetric nature of a problem by investigating isomorphic 
situations only once. The two approaches compared in the paper introduce very 
different techniques to tackle these problems. 

For the current paper, we restrict our investigation to the verification of 
safety and reachability properties. A safety property defines a desired property 
that should always hold on every execution path or (equivalently) an undesired 
situation which should never hold on any execution paths (which we will call 
a danger property below). A reachability property describes, on the contrary, 
a desired situation which should be reached along at least one execution path. 
From a verification point of view, safety and reachability properties are dual: the 
refutation of a safety property is a counter-example which satisfies the reach- 
ability property obtained as the negation of the safety property. On the other 
hand, if a safety property holds (or a reachability property is refuted) the model 
checker has to traverse the entire state space. 

A safety or reachability property can be interpreted as a special graph pat- 
tern (called property graph in the sequel) which immediately terminates the 
verification process if it is matched successfully. We have shown in [14] that the 
properties expressible in this way are equivalent to the 3^3 fragment of (V-free) 
first order logic with binary predicates. For instance, the property that there 
exists an element that is shared among two list cells, expressed by the first-order 
logic property 3 v: Int, Ci,C 2 : Cell . val(ci,u) A val(c 2 ,u) A Ci yf C 2 is alternatively 
encoded in the left graph of Figure 4. 

The other property graphs in Fig. 4 are Isolated stating that every Int-object 
is either a method parameter or contained in the list, and Terminated expressing 
that there are no Append-methods left. Since different interleavings of append 
method calls access the list concurrently, we need model checking to ensure that 
these properties hold. 
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Shared 


Isolated 


Terminated 


c1:Cell c2:Cell 


;a:Appendj i c:Cell j 


ia:Append! 


:val 




:val 


:x: ::val 




v:lnt 


v: | n t 



Fig. 4. Danger and reachability property graphs. 



3 The Check VML Approach 

Main concepts. The main idea of the Check VML approach [20,23,24] is to ex- 
ploit off-the-shelf model checker tools like SPIN [11] for the verification of graph 
transformation systems. More specifically, it translates a graph transformation 
system parameterized with a type graph and an initial graph (via an abstract 
transition system representation) into its Promela equivalent to carry out the 
formal analysis in SPIN. Furthermore, property graphs are also translated into 
their temporal logic equivalents. 

Traditional model checkers are based on so-called Kripke structures , which 
are state-transition models where the structure of a state consists of a subset 
of a finite universe of propositions. This determines the storage structures used 
(usually Binary Decision Diagrams or a variant thereof), the logic used to express 
properties (propositional logic extended with temporal operators, usually LTL 
or CTL) and the model checking algorithms ( automat a-based or tableau-based). 

Since graph transformation is a meta-level specification paradigm (i.e. it de- 
fines how each instance of a type graph should behave) while the Kripke structure 
(transition system) formalism of Promela is a model-level specification language 
(i.e. a Promela model describes how a specific model should behave), the main 
challenge in this approach is rule instantiation , i.e. to generate one Promela 
transition for all the potential application of a graph transformation rule in a 
preprocessing phase at compile time. 

The potential benefits of the Check VML approach are the following: 

1. It considers typed and attributed graphs which fits well to the metamodeling 
philosophy of UML and other modeling languages. 

2. The size of the state vector depends only on the dynamic model elements 
(i.e., elements that can be altered by at least one graph transformation rule) 
while immutable static parts of a model are not stored in the state vector. 
This is a typical case for data-flow like systems (dataflow networks, Petri 
nets, etc). 

3. It can be easily adapted to various back-end model checker tools. 

The essential disadvantage of the approach is that dynamic model elements 
(that are not restricted by static constraints) easily blow up both the verification 
model and state space; moreover, symmetries in graphs can be handled for only 
very limited cases. Further research is necessitated in these directions. 
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Graphs and transformation rules. CheckVML uses directed, typed and at- 
tributed graphs (or MOF metamodels and models) as model representation (see 
the example presented in Sec. 2. 1-2. 2). Inheritance between node types is also 
supported. 

Concerning the rule application strategy, CheckVML prescribes that a match- 
ing in the host graph should be an injective occurrence of the LHS (and NAC) 
graphs. Arbitrary creation and deletion of edges are allowed while there is an a 
priori upper bound for the number of nodes (of a certain type) potentially cre- 
ated during a verification run, which is passed as a parameter to the translator. 
Moreover, all dangling edges are implicitly removed when deleting a node. 



New (unpublished) features. Several new features of CheckVML have been added 
as an incremental improvement since the previous papers [20,24]. In order to 
improve performance, the entire tool has been rewritten, and the translator now 
uses relational database technology for generating all potential matches. 

The main novelty is the automated translation of property graphs into LTL 
formulae; thus the users do not need SPIN-specific knowledge for stating prop- 
erties. Since property graphs denote safety or reachability properties, thus this 
translation should find all potential matchings of this pattern in a similar way 
as done for instantiating rules. 

Furthermore, in order to handle certain isomorphic situations, node identi- 
fiers have been ordered and made reusable. When a new node is created, the 
smallest available identifier is assigned to it, therefore, the same node can be re- 
assigned several times. As a result, certain (but not all) isomorphic host graphs 
are handled only once. 

Input / Output formats. CheckVML uses the GXL format [21] to store all host 
graphs, rule graphs (LHS, RHS, NAC) and property graphs. An XML configu- 
ration file is responsible for declaring the role of a certain graph (rule, host or 
property), and the user can set several translation parameters as well (e.g. upper 
bound for nodes of a certain type). In the near future, we plan to port Check- 
VML to a graph transformation tool with visual graph and rule editing facilities. 
The AGG tool [8] is a primary candidate due to the similarities between both 
the graph models and XML formats. 

CheckVML generates a Promela model by instantiating rules on the host 
graph, and the SPIN representation of LTL formulae (which can be copy-pasted 
into the XSPIN framework). As a result, the users can work with high-level 
graph models and no (significant) SPIN-specific knowledge is required for mod- 
eling. However, counter-examples obtained as results of a verification run are 
currently available only in SPIN (for instance, in the form of scenarios/sequence 
diagrams), therefore, SPIN specific knowledge is required for the interpretation 
of analysis results. In the future, we also plan to investigate the possibilities of 
back-annotating analysis results so that they could be simulated (played back) in 
a graph transformation tool. Unfortunately, existing graph transformation tools 
provide very little support for importing entire execution traces. 
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Note that the overall ideas behind the Check VML approach are not restricted 
to SPIN. In fact, thanks to a recent extension, Check VML also yields an XML 
format for the generated transition system. Since the majority of model checker 
tools use transition systems as the underlying mathematical model (naturally, in 
their own dialect), this XML output can easily be adapted to various back-end 
model checkers, e.g. by XSLT scripts. 

4 The GROOVE Approach 

Main concepts. The idea behind the GROOVE approach (see [15] for further 
details on the project and downloads) is to use the core concepts of graphs and 
graph transformations all the way through during model checking. This means 
that states are explicitly represented and stored as graphs, and transitions as 
applications of graph transformation rules; moreover, properties to be checked 
should be specified in a graph-based logic, and graph-specific model checking 
algorithms should be applied. 

This approach implies that very little of the theory and tool development 
for traditional model checkers can be applied immediately, since the most basic 
concept, namely the underlying model, has been extended drastically. 

Currently only the state space generation part of GROOVE has been fully 
implemented. However, by the nature of graph transformation, this already im- 
plies the ability to express and check safety and reachability of graph properties, 
since they can be be formulated as rules with an identity morphism. Such a rule 
is applicable (idempotently) at precisely those states where the property holds. 
It is then straightforward to use such properties in controlling the state space 
generation process. 

In particular, when treating a safety/danger property as an invariant, the 
state space generation halts with unexplored states exactly if the property is 
violated; when treating the inverse of a reachability property as an invariant, it 
halts precisely if the property is satisfied. 

The GROOVE state space generator implements the process described in 
Sec. 2 to match each newly generated state against existing states up to iso- 
morphism. While an isomorphism check is in principle quite expensive, for the 
examples we have worked out it stays within practical bounds. 

The potential benefits of the GROOVE approach are the following: 

1. There is no a priori upper bound to the size of the graphs; 

2. There is an implicit symmetry check through the identification of isomorphic 
graphs; 

3. No pre- or post-processing is necessary to apply the GROOVE tool to a 
given graph transformation system, or to translate the results of the model 
checking back into graphs; 

4. Existing graph transformation theory can be directly brought to bear upon 
the tool, for instance, to discover rule independence or local confluence. 

The essential disadvantage of this approach is that the huge body of existing 
research in traditional model checking is only indirectly applicable. In each of 
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the areas where this applies, we aim to develop alternative techniques that are 
based directly on graphs. 

1. Storage techniques (e.g., Binary Decision Diagrams). Rather than storing 
each graph anew, we store only the differences with the graph that it was 
derived from, in terms of the nodes and edges added and removed. This 
does mean that the actual graph has to be reconstructed when it is needed, 
e.g., for checking isomorphism; to alleviate the resulting time penalty this 
minimal representation is combined with caching. 

2. State space reduction techniques , such as partial order reduction and ab- 
straction. For state space reduction, we intend to use confluence properties 
of graph transformation rules (see advantage 3 above) , or graph abstraction 
in the sense of shape graphs (see [19]). A first step towards the latter was 
reported in [17]. 

3. Logics and model checking algorithms. To replace the propositional logic used 
in traditional model checking, we have proposed a predicate graph logic 
in [13] for the purpose of formulating the properties to be checked. Some 
preliminary ideas on model checking such properties can be found in [4]. 



Graphs and transformation rules. GROOVE uses untyped, non-attributed, edge- 
labeled graphs without parallel edges. Node labels are not supported; however, 
we simulate them using self-edges (which indeed are also depicted by writing the 
labels inside the nodes). Furthermore, GROOVE implements the single puslrout 
rewrite approach [7] (which means that dangling edges are removed while non- 
injective matching of the LHSs is allowed). It supports the use of negative ap- 
plication conditions. These can be used to specify, among other things, injectiv- 
ity constraints; thus we can also simulate transformation systems in which the 
matchings are intended to be injective. 

For the purpose of graph transformation, the lack of typing in GROOVE 
is not a serious drawback, since type information is not used to control the 
transformation process (although it may be used to optimize it) . The absence of 
attributes is a potentially greater drawback. The examples presented here have 
been chosen such that attributes to not play a significant role, and so they can 
be simulated using ordinary edges. In fact, an extension to “true” attributes is 
not planned; rather, we plan to interpret data values as a special class of nodes, 
with ordinary edges pointing to them, as in [10]. 

Input/output. formats GROOVE uses the GXL format [21] to store host graphs 
and rules. Each rule is saved as a single graph, combining the information in 
LHS, RHS and NACs by adding structure on the edges (in the form of a prefix) 
that indicates their role - or, in the case of nodes, by adding special edges for 
this purpose. A graph transformation system consists of all the rules in a single 
directory as well as its subdirectories (which are treated as separate namespaces, 
thus giving rise to a simple hierarchy of rules). In the future we plan to support 
the special-purpose format GTXL (see [22]). 
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Table 1 . Feature comparison for Check VML and GROOVE. 





Aspects of comparison 


GROOVE 


CheckVML 


Graph model 


Directed graphs 


+ 


+ 




Labeled graphs 


+ 






Typed and attributed graphs 




+ 


GT rules 


NAC 


+ 


+ 




Node creation 


arbitrary number 


a priori upper bound 




Edge creation/removal 


+ 


+ 




Dangling edges 


removed 


removed 




Pattern matching 


non-injective 


injective 


Input / Output 


Graphical input (editor) 


+ 






XML input 


+ 


+ 




Graphical output (trace) 


built-in 


MSCs in XSPIN 




XML output 


+ 


+ 




Property to be proved 


graph constraint 


graph constraint or LTL in SPIN 






safety / reachability 


safety / reachability 


Verification 


Exploration strategies 


extensible library 


SPIN 




Symmetry recognition 


graph isomorphism 


reusable object ids 




Preprocessing 


none 


translation to SPIN 



Alternatively, the GROOVE tool packages a stand-alone graph editor that 
can be used to construct graphs and rules and save them in the required format, 
or to read and edit graphs obtained from elsewhere. 

State transition systems generated as a result of state space generation are 
also saved as GXL graphs, in which the nodes correspond to states (hence, 
graphs) and the edges to rule applications, labeled by the rule names. 

State spaces can be generated either using a graphical simulator or using a 
command-line tool. 

— The simulator, described before in [16], supports state space traversal by 
allowing the user to select and apply rules and matchings, all the while 
building up the transition system. Alternatively, the user can apply one of 
the available automatic state space exploration strategies (branching, lin- 
ear, bounded, invariant). Graphs and transition system can be inspected by 
showing and hiding edges based on regular expressions over their labels. 

— The command-line tool applies a pre-chosen strategy and generates and saves 
the resulting transition system. 

Finally, Table 1 provides a brief summarizing comparison of the two tools. 

5 Experimental Comparison 

We have carried out three different case studies to compare the two model check- 
ing approaches, namely, (1) the Concurrent Append example of the current pa- 
per, (2) the dining philosophers problem as discussed in [24], and (3) a mutual 
exclusive resource allocation example taken from [9]. In the following we briefly 
describe the salient features of these cases. 

Dining philosophers = Symmetries + No dynamic allocation. We have chosen 
this example because it is a traditional one, which has already been subject of a 
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study for the Check VML approach. For the purpose of GROOVE, this is an in- 
teresting case because with n philosophers, the example obviously has symmetry 
degree n, and this should then also be the reduction factor in number of states 
and transitions. On the other hand, the example has no dynamic allocation, and 
in this sense is not typical of the sort of problem for which we expect a graph 
transformation-based approach to be superior to traditional model checkers. We 
checked a safety property stating that no forks are ever held by more than one 
philosophers. 

Concurrent append = Dynamic allocation + No symmetries. This is the running 
example of the paper. We have chosen it because it combines features that we 
believe to be typical of the “hard” problems in software verification. On the one 
hand, it contains dynamic allocation (list cells are created and append method 
frames are created and deleted), and on the other hand, it specifies concurrent 
behavior (several append methods are running in parallel). Note that, in the 
representation chosen here, the example has few non-trivial symmetries. In par- 
ticular, all Int-objects in the list are distinguished by their value. We checked 
the property expressing that the list of cells is not allowed to contain the same 
value more than once. 

Mutual exclusion = Dynamic allocation + Symmetries. In this example, pro- 
cesses try to access shared resources by using a token ring. We have chosen this 
example because it combines dynamic allocation (processes and resources can be 
created and deleted arbitrarily) and symmetry (processes and resources cannot 
be distinguished from one another). Moreover, a graph-based description of the 
protocol is very natural: an argument can be made that the specification of this 
protocol using graph transformation rules is superior to any other. The verified 
requirement was that at most one process may be allowed to access each resource 
at a time. 

Of the examples presented here, this is the only one for which the state 
space is actually infinite (there is no upper bound to the numbers of processes 
and resources). Therefore, an artificial upper bound has to be imposed for the 
purpose of state space generation. 

Results. In Table 2, we compare (a) the number of states traversed by the model 
checker during a successful verification run, (b) the number of transitions in the 
(reachable) state space, (c) the size of memory footprint of the state space, and 
(d) the execution time for the verification run. Furthermore, we also present the 
preprocessing time required for ClreckVML to translate graph transformation 
systems into SPIN and the size of state vectors in SPIN. 

We have done our best to produce the results of both approaches on an equal 
basis. We briefly list the characteristics of the experiments: 

Memory Usage and Run-Time Performance. Experiments were run on a 

3 GHz Pentium IV processor with 1 GB of memory. For the GROOVE 

experiments, Java Virtual Machine was started with an initial memory size 
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Table 2. Comparison of verification runs for CheckVML+SPIN and GROOVE. 





entities 


preproc 


vector 


states 


transitions memory run time 


DinPhil 


U 


s 


#bits 


# 


# 


MB 


s 


CheckVML ♦ 


3 


3.8 


36 


57 


125 


2.6 


0.2 


SPIN 


4 


4.5 


48 


181 


554 


2.6 


0.2 




5 


5.0 


60 


603 


2.397 


2.6 


0.2 




8 


6.6 


112 


25.961 


171.058 


8.8 


0.6 




10 


9.1 


156 


328.503 


2.711.200 


90.8 


7.5 




12 






out of memory (for SPIN) 






Groove 


3 






17 


41 


0.0 


0.1 




4 






45 


148 


0.0 


0.2 




5 






117 


481 


0.0 


0.5 




8 






3.261 


21.536 


1.7 


13.6 




10 






32.903 


271.634 


41.8 


199.5 




12 






106.329 


965.589 


74.2 


793.3 


Append 


App : Cell 


Append calls and cells initially present in the system 


CheckVML ♦ 


2:3 (ong) 






out of memory (SPIN) 






SPIN 


2:3 (mod) 


15.3 


200 


22 


169 


2.6 


0.5 




2:5 (mod) 


117,9 


316 


86 


395 


2.6 


1.1 




3:5 (mod) 


1.021.0 


520 


3311 


5764 


37.0 


40.0 




rest 






out of time (for CheckVML) 






Groove 


2:3 






57 


116 


0.0 


0.3 




2:5 






145 


292 


0.0 


0.6 




3:5 






1.125 


3.163 


0.4 


4.4 




3:7 






2.716 


7.768 


1.0 


13.0 




4:8 






31.104 


116.658 


12.4 


212.1 


Mutex 


pr:res:new 






CheckVML + 


2:2:0 


6.1 


44 


5772 


38.557 


2.8 


1.3 


SPIN 


3:2:0 


18.5 


60 


697 004 


6.843.310 


83.2 


14.7 




rest 


24.3-180 




at least 70 minutes (execution aborted) 




Groove 


2:2:0 






8384 


15.936 


2.3 


4.2 




3:2:0 






262.054 


620.284 


79.1 


162.6 




3:3:0 






out of memory at around 1 million states 




2:0:2 






11.692 


22.675 


3.1 


5.5 




2:0:3 






515 134 


1.206.935 


155.6 


361.8 



Notation: pr is the number of processes initially present in the system 



res is the number of resources initially present in the system 

new is the upper bound for additional resources and additional processes 



of 100 MB and maximum size of 1 GB. Although the space used for the 
actual storage of the state space is under 200 MB for all the cases reported 
here, during state space generation the tool heavily relies of caching and 
limiting the amount of available memory dramatically worsens the run-time 
performance. 

Bounding the State Space. For the mutual exclusion example, we had to 
put a bound to the state space (as mentioned above). The way this is im- 
plemented in both tools is different. In GROOVE, all states which violate 
the bounding constraint are first generated and added to the transition sys- 
tem, after which the violation is detected and they are ignored for further 
exploration. In the Check VML approach, on the other hand, the violation 
is checked first and hence those states are not generated at all. It turns out 
that the “spurious” states in the GROOVE results comprise about 85% of 
the state space and about 25% of the number of transitions. 

Activating vs. Creating Nodes. For the original concurrent append exam- 
ple, SPIN failed even on very small examples due to the fact that each node 
and edge type is dynamic 2 . However, verification times for Check VML + 

2 ClieckVML generates the cross-product of all nodes and edges in the preprocessing 
phase even though the number of edges are only linear in the number of nodes. 
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SPIN could be reduced by a modeling trick, i.e. altering the models and the 
rules by adding an is Active attribute for each node type and only activating 
a node by changing this attribute instead of “real” node creation. This way, 
many graph elements that were originally dynamic are turned into static 
elements and thus abstracted by Check VML during preprocessing. This ex- 
ample thus also demonstrated some pros and contras of graph attributes. 
However, the experimental results for the two approaches are not directly 
comparable (as denoted by the “(mod)” postfix after the append test cases 
in Table 2). 

Evaluation. Based on Table 2, we come to the following overall conclusions: 

— The space needed to store the transition system generated by both tools is 
comparable. Yet the techniques are very different: for GROOVE it is based 
on storing the differences between successive states, in terms of nodes and 
edges added and removed, whereas SPIN (and hence ClreckVML) stores 
states as bit vectors that encode the entire graphs. 

— The time needed to generate the states spaces is in a different order of mag- 
nitude: on the cases reported here, ClreckVML typically takes under a tenth 
of the time that GROOVE does. For this we offer three possible explana- 
tions: (a) SPIN clearly shows the benefits of a more mature technology: over 
a decade of research has gone into improving its implementation, (b) Over 
the years, SPIN has been heavily optimized towards its implementation in 
C, whereas GROOVE has been implemented entirely in Java, (c) The ap- 
proach taken by GROOVE, involving explicit graph matching and graph 
isomorphism checks, is inherently more complex. 

— For each of the problems studied the GROOVE approach can handle a larger 
dimension than the ClreckVML approach (which dimension is unquestion- 
ably significant for the append and mutual exclusion examples) . This shows 
that the potential advantages of the approach, in terms of symmetry check- 
ing and dealing with dynamic allocation, also really show up in practice. 

6 Conclusions 

In the paper, we tackled the problem of model checking graph transformation sys- 
tems by two different approaches. ClreckVML exploits traditional model check- 
ing techniques for verification by translating graph transformation systems into 
SPIN, an off-the-shelf model checker. GROOVE, on the other hand, uses the 
core concepts of graphs and graph transformations all the way through during 
model checking. 

We compared the two approaches on three case studies having essentially 
different characteristics concerning the dynamic and symmetric nature of the 
problem. Our overall conclusion is the following: 

— If the problem analyzed lends itself well to be modeled in SPIN; that is, if 
dynamic allocation and/or symmetries are limited, it is to be expected that 
the ClreckVML approach will always remain superior. 
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— On the other hand, for problems that are inherently dynamic, the GROOVE 
approach is a promising alternative. 

Our conclusions also imply certain directions for future work. Obviously, 
Check VML would yield a much more succinct state vector if further constraints 
on the metamodels (such as multiplicities) were handled in the preprocessing 
phase. For GROOVE, it is an interesting issue to make isomorphism checks 
optional (thus serving as an intelligent compression technique). However, the 
main line of research should find sophisticated abstraction techniques especially 
for infinite state graph transformation systems. 
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Abstract. We examine the power and limitations of the weakest vertex 
relabelling system which allows to change a label of a vertex in function 
of its own label and of the label of one of its neighbours. We characterise 
the graphs for which two important distributed algorithmic problems are 
solvable in this model: naming and election. 



1 Introduction 



The role of the local computation mechanisms is fundamental for delimiting 
the borderline between positive and negative results in distributed computation. 
Understanding the power of local computations in different models enhances 
our understanding of basic distributed algorithms. Yamashita and Kameda [11], 
Boldi and al. [3], Mazurkiewicz [8] and Chalopin and Metivier [4] characterise 
families of graphs in which election is possible under different models of dis- 
tributed computations. Even if these results cover a broad class of models there 
are still a few natural models which were not yet examined. We consider here 
one of such models where an elementary computation step modifies the state 
of one network vertex and this modification depends on its current state and 
on the state of one of its neighbours. We solve, in this model, two important 
algorithmic problems: the election problem and the naming problem, which turn 
out to be not equivalent. We give the characterisation of graphs which admit 
distributed solutions for both problems in this model. 

To this end we find suitable graph morphisms that enable to formulate con- 
veniently the necessary conditions in the spirit of Angluin [1]. It turns out that 
in our case the relevant morphisms are graph submersions. The presented condi- 
tions are also sufficient: algorithms, inspired by Mazurkiewicz [8], are given, that 
enable to solve the naming and the election problems for corresponding graphs. 
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1.1 Our Model 

A network of processors will be represented as a connected undirected graph 
G = (V(G),E(G)) without self-loop and multiple edges. As usual the vertices 
represent processors and edges direct communication links. The state of each 
processor is represented by the label X(v) of the corresponding vertex. An ele- 
mentary computation step will be represented by relabelling rules of the form 
given schematically in Figure 1. The computations using uniquely this type of 
relabelling rules are called in our paper cellular edge local computations. Thus 
an algorithm in our model is simply given by some (possibly infinite but always 
recursive) set of rules of the type presented in Figure 1. A run of the algorithm 
consists in applying the relabelling rules specified by the algorithm until no rule 
is applicable, which terminates the execution. The relabelling rules are applied 
asynchronously and in any order, which means that given the initial labelling 
usually many different runs are possible. 

X Y X ' Y 

• -o > • o 

Fig. 1. Graphical form of a rule for cellular edge local computations. If in a graph G 
there is a vertex labelled X with a neighbour labelled Y then applying this rule we 
replace A by a new label X' . The labels of all other graph vertices are irrelevant for 
such a computation step and remain unchanged. The vertex of G changing the label 
will be called active and filled with black, the neighbour vertex used to match the rule 
is called passive and marked as unfilled on the figure. All the other vertices of G not 
participating in such elementary relabelling step are called idle. 



1.2 Election, Naming and Enumeration 

The election problem is one of the paradigms of the theory of distributed com- 
puting. It was first posed by LeLann [6]. A distributed algorithm solves the 
election problem if it always terminates and in the final configuration exactly 
one processor is marked as elected and all the other processors are non elected. 
Moreover, it is supposed that once a processor becomes elected or non elected 
then it remains in such a state until the end of the algorithm. Elections con- 
stitute a building block of many other distributed algorithms since the elected 
vertex can be subsequently used to make some centralised decisions, to initialise 
some other activity, to centralise or to broadcast information etc. 

The generic conditions listed above, required for an election algorithm, have 
a direct translation in our model: we are looking for a relabelling system where 
each run terminates with exactly one vertex labelled elected and all the other 
vertices labelled as non elected. Again we require that no rule allows to change 
either an elected or a non-elected label. 

The naming problem is another important problem in the theory of dis- 
tributed computing. The aim of a naming algorithm is to arrive at a final config- 
uration where all processors have unique identities. To be able to give dynami- 
cally and in a distributed way unique identities to all processors is very important 
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since many distributed algorithm work correctly only under the assumption that 
all processors can be unambiguously identified. 

The enumeration problem is a variant of the naming problem. The aim of 
a distributed enumeration algorithm is to attribute to each network vertex a 
unique integer in such a way that this yields a bijection between the set V (G) 
of vertices and {1,2,..., |V(G)|}. 

We also distinguish two kinds of termination: the implicit one that simply 
means that the algorithm always terminates, and the explicit one that means 
that at least one node can detect that the algorithm has terminated. Obviously, 
if we can solve the naming problem with an explicit termination then we can 
also elect, for example the vertex with the smallest or the greatest identity. 

The naming and the election problems are often equivalent for various com- 
putational models [8,4], however this is not the case for our model. It turns 
out that in our model the class of graphs for which naming is solvable admits a 
simple and elegant characterisation; unfortunately a similar characterisation for 
the election problem is quite involved. 

1.3 Overview of Our Results 

Under the model of cellular edge local computations, we present a complete char- 
acterisation of graphs for which naming and election are possible: Theorems 7 
and 11. The problems are solved constructively, we present naming and election 
algorithms that work correctly for all graphs where these problems are solvable. 
Imposed space limitations do not allow to present the correctness proofs for our 
algorithms. 

1.4 Related Works 

The election problem was already studied in a great variety of models [2,7, 10]. 
The proposed algorithms depend on the type of the basic computation steps, 
they work correctly only for a particular type of a network topology (tree, grid, 
torus, ring with a known prime number of vertices etc.) or it is assumed that 
some initial extra knowledge is available to processors. 

Yamashita and Kameda [11] consider the model where, in each step, one 
of the vertices, depending on its current label, either changes the label, or 
sends/receives a message via one of its ports. They proved that there exists 
an election algorithm for G if and only if the symmetricity of G is equal to 1, 
where the symmetricity depends on the number of labelled trees isomorphic to 
a certain tree associated with G ([11], Theorem 1 p. 75). 

Mazurkiewicz [8] considers the asynchronous computation model presented 
in Figure 2. His characterisation of the graphs where enumeration/election are 
possible is based on the notion of non ambiguous graphs and may be formu- 
lated equivalently using coverings [5]. He gives a nice and simple enumeration 
algorithm for the graphs minimal for the covering relation. 

Boldi and al. [3] consider a model where the network is a directed multigraph 
G and contrary to our model they allow also arc labellings. When a processor is 
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X 7 X' 7 





Fig. 2. In the model of Mazurkiewicz [8] a chosen vertex can change its label together 
with all its neighbours. The relabelling rules have therefore the form presented on this 
figure. Note that this involves a much greater degree of synchronisation than in the 
systems that we examine in our paper. 

activated, it changes its state depending on its previous state and on the states 
of its ingoing neighbours; the outgoing neighbours do not participate in such an 
elementary computation step. They investigate two modes of computation: syn- 
chronous and asynchronous while in our paper only asynchronous computations 
are examined. In their study, they use fibrations which are generalisations of 
coverings. Boldi and al. [3] prove that there exists an election algorithm in their 
model for a graph G if and only if G is not properly fibred over another graph 
H (for the asynchronous case, they only consider discrete fibrations). To obtain 
this characterisation, they use the same mechanism as Yamashita and Kameda: 
each node computes its own view and next the node with the weakest view is 
elected. 

In [4], three different asynchronous models are examined. Schematically, the 
rules of all three models are presented in Figure 3. Note that, contrary to the 
model we examine in the present paper, all these models allow edge labelling. 
It turns out that for all models described in Figure 3 naming and election are 
equivalent. In [4], it is proved that for all models described in Figure 3 the 
election and naming problems can be solved on a graph G if and only if G is 
not a covering of any graph H not isomorphic to G , where H can have multiple 
edges but no self-loop. 

We can note that, although the model studied in this paper and model A 
in Figure 3 seem to be very close, the characterisations of graphs for which the 
naming problem and the election problem can be solved in these models are very 
different. The intuitive reason is that if we allow to label the edges then each 
processor can subsequently consistently identify the neighbours. On the other 
hand, in the model that we examine here, since edges are no more labelled, a 
vertex can never know if it synchronises with the same neighbour or another 
one. 

2 Preliminaries 

We consider finite, undirected, connected graphs G = (V (G), E(G)) with vertices 
V(G) and edges E(G) without multiple edges or self-loop. Two vertices u and 
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Model A: 


X • 


Y 


• z 


Model B: 


X • 


Y 


-O z 



V' 

■>■ A" • • Z' 

v> 

► A" • o Z 



Model C: 





Fig. 3. Elementary relabelling steps for the three models examined in [4]. 



v are said to be adjacent or neighbours if {u, t;} is an edge of G (thus u and v 
are necessarily distinct since no self-loop is admitted) and Nq(v ) will stand for 
the set of neighbours of v. An edge e is incident to a vertex v if v £ e and Ig(v) 
will stand for the set of all the edges of G incident to v. The degree of a vertex 
v, denoted dc(v), is the number of edges incident with v. 

A homomorphism between graphs G and H is a mapping 7: V(G) — > V{H) 
such that if {u,v} £ E(G) then {7(w), 7(1;)} £ E(H). Since our graphs do not 
have self-loop, this implies that 7 (w) 7^ 7(f) whenever u and v are adjacent. 

We say that 7 is an isomorphism if 7 is bijective and 7 -1 is a homomorphism. 
A class of graphs will be any set of graphs containing all graphs isomorphic to 
some of its elements. A graph H is a subgraph of G , noted H C G, if V ( H ) C 
V(G) and E(H) C E(G). An occurrence of H in G is an isomorphism 7 between 
H and a subgraph H' of G. 

For any set S, |Sj denotes the cardinality of S while Va n (S) is the set of finite 
subsets of S. For any integer q , we denote by [ 1 , q] the set of integers { 1 , 2 , . . . , q}. 

Throughout the paper we will consider graphs where vertices are labelled 
with labels from a recursive label set L. A graph labelled over L is a couple 
G = (G, A), where G is an underlying non labelled graph and A: V(G) — > L is 
a (vertex) labelling function. The class of graphs labelled by L will be denoted 
by Ql- 

Let H be a subgraph of G and A h the restriction of a labelling A : V (G) — > L 
to V(H). Then the labelled graph H = (H^h) is called a subgraph of G = 
(G, A); we note this fact by H C G. A homomorphism of labelled graphs is just 
a labelling-preserving homomorphism of underlying unlabelled graphs. 

Submersions are locally surjective graph morphisms: 

Definition 1 . A graph G is a submersion of a graph H via a morphism 7: G — > 
H if\/v £ V(G), 7 is surjective on the neighbourhood Ng{v), that is ”/(Ng(v)) = 
Nh (7(^) )■ The graph G is a proper submersion of H ifj is not an isomorphism; 
G is submersion-minimal if G is not a proper submersion of any other graph. 

Naturally, submersions of labelled graphs are just submersions of underlying 
unlabelled graphs preserving the labelling. 

For any set 1 Z of edge local relabelling rules of the type described in Figure 1 
we shall write G 1 Z G' if G' can be obtained from G by applying a rule of 1 Z on 
some edge of G. Obviously, G and G' have the same underlying graph G, only 
the labelling changes for exactly one (active) vertex. Thus, slightly abusing the 
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2 2 




1 G | H 



Fig. 4. The labelled graph G is a submersion of H via the mapping 7 which maps each 
vertex of G labelled i to the unique vertex of H with the same label i. This submersion 
is proper and the graph H is itself submersion-minimal. 



notation, TZ will stand both for a set of rules and the induced relabelling relation 
over labelled graphs. The transitive closure of such a relabelling relation is noted 

n*. 

The relation TZ is called noetherian on a graph G if there is no infinite 
relabelling sequence Go TZ Gi 7 Z , with Go = G. The relation 7 Z is noetherian 

on a set of graphs if it is noetherian on each graph of the set. Finally, the relation 
7 Z is called noetherian if it is noetherian on each graph. 

Clearly noetherian relations code always terminating algorithms. 

The following simple observation exhibits a strong link between submersions 
and cellular edge local relabellings. This is a counterpart of the lifting lemma of 
Angluin [1] adapted to submersions. 

Lemma 2 (Lifting Lemma). Let 7 Z be a cellular edge locally generated rela- 
belling relation and let G be a submersion of H. If H TZ* H' then there exists 
G' such that G TZ* G' and G' is a submersion of Hb 

Proof. It is sufficient to prove the lemma for one step of the relabelling. Let 
ip : G -► H be a submersion, G = (G, A),H = (H,v). Suppose that a cellular 
edge rule is applied to an active vertex v G V ( H ) yielding a new labelling v' on 
H . Then, since ip is a submersion, all vertices of are pairwise non adjacent 

and therefore we can apply the same relabelling rule to all vertices of ip~ l {v) in 
G, in any order. This yields a labelling on G such that ip : (G, A') — > (L7, u') 
remains a submersion. Note that we have simulated here one step relabelling in 
H by several relabellings in G that use the same rule. □ 

3 Enumeration and Naming Problems 

We prove that there exists no naming algorithm and no enumeration algo- 
rithm on a graph G using cellular edge local computations if the graph is not 
submersion- minimal. The proof is analogous to that of Angluin [1], 

Proposition 3. Let G be a labelled graph which is not submersion-minimal. 
There is no naming algorithm for G and no enumeration algorithm for G using 
cellular edge local computations. 

Proof. Let H be a labelled graph not isomorphic to G such that G is a submer- 
sion of H via ip. For every cellular edge local algorithm TZ , consider an execution 
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of 1Z on H that leads to a final configuration H'. From Lemma 2, there exists an 
execution of 1Z on G such that the final configuration G' = (G, A') is a submer- 
sion of H'. Since G' is not isomorphic to H', there exist distinct v,v' £ V(G) 
such that X(v) = A'(u'). Consequently, 1Z does not solve either the naming or 
the enumeration problem on G. □ 

3.1 An Enumeration Algorithm 

In this section, we describe a Mazurkiewicz-like algorithm Ad using cellular 
edge local computations that solves the enumeration problem on a submersion- 
minimal graph G. 

Each vertex v attempts to get its own number between 1 and |V(G)|. A 
vertex chooses a number and exchanges its number with its neighbours. If a 
vertex u discovers the existence of another vertex v with the same number, then 
it compares its local view (the numbers of its neighbours) with the local view of v. 
If the label of u or the local view of u is “weaker” , then u chooses another number 
and broadcasts it again with its local view. At the end of the computation, every 
vertex will have a unique number if the graph is submersion-minimal. 

We consider a graph G = (G, A) with an initial labelling A: V{G) — > L. 
During the computation each vertex v £ V (G) will acquire new labels of the 
form (A (v),n(v),N(v),M(v)), where: 

— the first component A(v) is just the initial label (and thus remains fixed 
during the computation), 

— n(v ) £ N is the current identity number of v computed by the algorithm, 

— N(v) £ ’Pfin(N) is the local view of v. Intuitively, the algorithm will try to 
update the current view in such a way that N(v) will consist of current 
identities of the neighbours of v. Therefore N(v) will be always a finite 
(possibly empty) set of integers, 

— M(v) CNxLx Pfin(N) is the current mailbox of v. It contains the whole 
information received by v during the computation. 

The fundamental property of the algorithm is based on a total order on the 
set "PfinW of local views, as defined by Mazurkiewicz [8]. 

Let Ni,N 2 £ 'Pfin(N), Ni N 2 . Then N\ ~< N 2 if the maximal element of 
the symmetric difference Ni A N 2 = (Ni \N 2 ) U (_/V 2 \ N{) belongs to JV 2 . Note 
that in particular the empty set is minimal for Y. 

It can be helpful to note that the order -< is just a reincarnation of the usual 
lexicographic order. Let n\, n 2 , ■ ■ ■ , rik and mi, m 2 , . . . ,mi be all elements of AA 
and N 2 respectively listed in the decreasing order (decreasing for the usual order 
over integers ) \ n\ > n 2 > ■ ■ ■ > and mi > m 2 > ■ ■ ■ > mi. Then N\ -< N 2 iff 
either (i) k < l and for alH, 1 < i < k, = m* or (ii) ni < rrii where i is the 
smallest index such that m mj. 

If N(u) -< N(v) then we say that the local view N(v) of v is stronger than 
the one of u (and N(u) is weaker than N(v)). 

We assume for the rest of this paper that the set of labels L is totally ordered 
by <l- 
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Finally, we extend -< to a total order on L x Pfi n (N) : (l,N) -< (l',N') if 
either l < L V or ( / = V and N -< N' ). 

Occasionally we shall use the reflexive closure A of -<. 

We describe here the relabelling rules that define the enumeration algorithm. 

First of all, to launch the algorithm there is a special initial rule Mo that 
just extends the initial label X(v) of each vertex v to (A(u), 0, 0, 0). The rules 
M\ and M 2 are close to the rules used by Mazur kiewicz [8]. The first rule M 1 
enables a vertex to update its mailbox by looking at the mailbox of one of its 
neighbours: 

{h, ni, Ni, M\) (l 2 , n 2 , N 2 , M 2 ) (h,m,Ni,M[) (l 2 , n 2l N 2 , M 2 ) 

Mi : • O >- • O 

If M 2 \ Mi ± 0 then M[ := Mi U M 2 . 

The second rule M 2 does not involve any synchronisation with a neighbour 
vertex. It enables a vertex v to change its identity if the current identity number 
n(v) is 0 or if the mailbox of v contains a message from a vertex with the same 
identity but with a stronger label or a stronger local view. 

(, l,n,N,M ) ( l,k,N,M ') 

M 2 : • >- • 

If n = 0 or there exists {n,£',N') £ M such that (£,N) -< (£',N') then 
k := 1 + max{n' | 3(n',£', N 1 ) £ Mj and M' := M U {(k,£,N)}. 

(In the formula above we assume that max of an empty set is 0.) 

The third rule M 3 allows to change the current identity for a vertex v having 
a neighbour v' with exactly the same current label (all four components should 
be identical). Moreover, at the same step, the identity n(v') of the neighbour v' 
of v is inserted into the local view N(v) and at the same time all the elements 
m of N(v) such that m < n{v') are deleted from the local view. The rationale 
behind this deletion step is explained in the rule Ma below. 

(, l,n,N,M ) (, l,n,N,M ) (/,fc,W,M / ) (, l,n,N,M ) 

M 3 : • O >■ • O 

If n > 0 and V(n, £',N') £ M, ( £',N ') ^ {£,N) then k := 1 + max{n' | 

3 (n',£',N') £ M}, N' := N \ {m £ N \ m < n} U {n} and M' := M U 
{(k,£,N ') }. 

The fourth rule M 4 enables a vertex v to add the current identity number 
n(v') of one of its neighbours to its local view N(v). As for the preceding rule, 
all the elements m belonging to N(v) such that m < n(v') are deleted from the 
current view. 

The intuitive justification for the deletion of all such m is the following. Let 
us suppose that the vertex v synchronises with a neighbour v' and observes that 
the current identity number n(v') of v' does not belong to his current view N{y). 
Then, since the very purpose of the view N(v) is to stock the identity numbers 
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of all the neighbours, we should add n(v') to the view N(v) of v. But now two 
cases arise. If v synchronises with v' for the first time then adding n(v') to the 
view of v is sufficient. However, it can also be the case that v synchronised with 
v 1 in the past and in the meantime v' has changed its identity number. Then v 
should not only add the new identity number n(v') to its view but, to remain in 
a consistent state, we should delete the old identity number of v' from the local 
view of v. The trouble is that v has no means to know which of the numbers 
present in its view N ( v ) should be deleted and it is even unable to decide which 
of the two cases holds (first synchronisation with v' or not). However, since 
our algorithm assures the monotonicity of subsequent identity numbers of each 
vertex, we know that the eventual old identity number of v' is less than the 
current identity n(v'). Therefore, by deleting all m < n(v') from the local view 
N(v) we are sure to delete all invalid information. Of course, in this way we risk 
to delete also the legitimate current identities of other neighbours of v from its 
view N(v). However, this is not a problem since v can recover this information 
just by (re)synchronising all such neighbours. 




In the following (A (v) , rn(v) , Ni(v) , Mi(v)) will denote the label of a vertex v 
after the ith computation step of the algorithm A4 given above. 

The algorithm has some remarkable monotonicity properties: 



Lemma 4. For each step i and each vertex v: (A) ni(v) < rii+i(v), (B) Ni(v) ^ 
Ni+i(v), and (C) Mi(v) C Mi + i(v). Moreover, there exists at least one vertex v 
such that at least one of these inequalities/inclusions is strict for v. 

The local knowledge of a vertex v reflects to some extent some real properties 
of the current configuration: 

Lemma 5. Let v £ V(G). If(m,£,N ) £ Mi(v ) then for some vertex w £ V(G), 
Ui(w) = to. If Ui(v) ^ 0 and (in', £',N') £ Mi(v) then, for every 1 < to < m! , 
there exist i and N such that (m,£,N) £ Mi(v). 

This fact allows to deduce the following properties of the final labelling: 

Lemma 6. Any run p of the enumeration algorithm on a connected labelled 
graph G = (G, A) terminates and yields a final labelling (A, n p , N p , M p ) satisfying 
the following conditions: 

(1) Let to be the maximal number in the final labelling, m = max{n p (u) | v £ 
V(G)}. Then for every 1 < p < m there is some v £ V(G) with n p (v) = p, 



and for all vertices v, v' : 
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(2) M p (v) = M p {v'), 

(3) (A (v),n p (v),N p (v)) £ M p (v'), 

(4) n p (v) = n p (v') implies that X(v) = A(V) and N(v) = N(v'), 

(5) n £ N p (v) if and only if there exists w £ Ng(v) such that n p {w ) = n; in 
this case, n p (y) £ N p (w). 

Under the notation of Lemma 6 we can construct the labelled graph H p : the 
vertices of H p are integers n p (y), i.e. final identity numbers, each n p [y) labelled 
by X(v) (this labelling is well defined by Lemma 6 (4)) and with edges naturally 
inherited from G. In fact, the mapping n p is a submersion from G to H p . This 
observation yields: 

Theorem 7. For every graph G, the following statements are equivalent: 

(i) there exists a naming algorithm on G using cellular edge local computa- 
tions, 

(ii) there exists a naming algorithm with termination detection on G using 
cellular edge local computations, 

(Hi) there exists an enumeration algorithm on G using cellular edge local com- 
putations, 

(iv) there exists an enumeration algorithm with termination detection on G 
using cellular edge local computations, 

(v) the graph G is a submersion-minimal graph. 



4 Election Problem 

If we can solve the enumeration problem then we can solve the election problem; 
once a vertex gets the identity number |U(G)| we declare it elected. 

Nevertheless, in our model, the enumeration and the election problems are 
not equivalent. The graph G in Figure 5 is not submersion-minimal, since the 
morphism from G to H induced by the labelling of G is locally surjective and 
therefore neither the enumeration nor the naming problem can be solved on G. 
But let us execute the preceding algorithm on G. At the end, the vertex labelled 
3 in G will know that it is unique with at least three different neighbours and 
therefore can declare itself as elected. 




G H 



Fig. 5. A graph for which we can solve the election problem but not the enumeration 
problem. 



We would like to give here necessary conditions characterising the graphs 
with solvable election problem. Given a graph G, we denote by Sq, the set of 
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graphs H such that there exists a submersion from G onto H. From Lemma 2, 
any algorithm A that solves the election problem on G using cellular edge local 
computations will solve the election problem on every graph H £ Sc- 

Remark 8. Consider an algorithm A that solves the election problem on G. 
Suppose that there exists a subgraph G' of G that is a submersion of a graph 
H £ Sc via a morphism ip. If there exists an execution of A on H that elects a 
vertex v £ V(H) such that fy _1 (?;)| > 1, then there exists an execution of A on 
G' such that the label elected appears at least twice. Since each execution of A on 
G' can be extended to an execution of A on G, there exists an execution of A over 
G that leads to the election of at least two vertices, this is in contradiction with 
the choice of A. We can therefore define Th(G', ip) = {v £ V(H) | |<p _1 (u)| > 1} 
and each execution of A on H cannot elect a vertex v £ Ph(G', ip). 

Consider a graph H £ Sc- Let Ph(G) be the union of all Ph(G',<^) for ip 
ranging over all submersions of subgraphs G' of G to H and Ch(G) = V{H ) \ 
Ph(G) (the elements of this set are called the candidates of H for G). From 
Remark 8, every election algorithm A over G must be such that each execution 
of A over H should elect a vertex in Ch(G). Consequently, if there exists an 
election algorithm A on G then for every graph H £ Sg, Ch(G) ^ 0. 

Suppose that there exist two disjoint subgraphs Gi and G 2 of G such that 
Gi (resp. G 2 ) is a submersion of a graph Hi £ Sc (resp. H 2 £ 5g). Then there 
does not exist any election algorithm using cellular edge local computations. 
Indeed, otherwise, there exists an execution of the algorithm on G such that 
the label elected appears once in Gi and once in G 2 , which is impossible for an 
election algorithm. Recapitulating: 

Proposition 9. Let G be a labelled graph such that there exists an election algo- 
rithm for G using cellular edge local computations. Then the following conditions 
are satisfied: 

1. for every H £ S G , C'h(G) fy 0, 

2. there do not exist two disjoint subgraphs Gi and G 2 of G such that Gi 

(resp. G 2/ ) is a submersion of a graph Hi £ Sc (resp. H 2 £ Sc)- 

4.1 An Election Algorithm 

We now consider a graph G satisfying the conditions of Proposition 9. 

Our aim is to present an algorithm such that each execution over G will 
detect a graph H £ Sc such that there exists a subgraph G' of G that is a 
submersion of H. 

To this end we adapt the enumeration algorithm from the preceding section 
and the termination detection algorithm of Szymansky, Shi and Prywes [9] . 

The idea is to execute the enumeration algorithm given for a graph and to 
reconstruct a graph from the mailboxes of the nodes. If the reconstructed graph 
is an element of Sc, the nodes check if they all agree on this graph. 

As in Section 3.1, we start with a labelled graph G = (G, A). During the 
computation vertices v will get new labels of the form (A(u), n(v), N(v), M(v), 
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a(v),H(v)) representing the following information (again the first component 
X(v) remains fixed) : 

— n(v) £ N is the identity number of the vertex v computed by the algorithm, 

— a(v) £ N is the confidence level of the vertex v , 

— N(v) is the local view of v. If the vertex v has a neighbour v' , relabelling rules 
will allow v to add the couple (n(v'),a(v')) to N(v). Thus N(v) is always 
a finite set of couples of integers. For N £ "Pfi n (N 2 ), we note 77i(7V) = {n \ 
3 (n,a) £ N} the projection on the first component. 

— M(v) C N x L x 'Pfin(N) is the mailbox of v and contains the information 
received by v about the identity numbers existing in the graph and the local 
views associated with these numbers. 

— H (v) is the history of the vertex v. If at some computation step (n, N , M, a) £ 
H (v) then it means that at some previous step the vertex v had a confidence 
level equal to a for the value M. 



The first computation step So replaces just the initial label A(v) by 
(A(u), 0, 0, 0, —1, 0). The following four rules mimic the rules of the enumeration 
algorithm: 




The fifth rule says that if a vertex v detects that all the neighbours it knows 
have a confidence level a > a{v) then it can increment its own confidence level. 
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To define this rule we need some additional notations. Given a mailbox con- 
tent M, for each n > 0 we define n n (M) as the set of all triples ( n,£,N ) £ M 
with the first component n. For each non empty set ir n (M) we conserve in the 
mailbox only the triple (n, £, N) with the greatest couple (n, N) for the order 
This operation gives a new mailbox content that we shall note u(M). 

The next step consists in defining a graph Gm • If there exist 
(ni,£i,iVi), {n 2 ,£ 2 ,N 2 ) £ u(M) such that (n 2 ,£ 2 ) £ W and (m,£i) ^ N 2 
then we set Gm = (0,0)- Otherwise, Gm is the graph such that V(Gm) = 
{n | ( n,£,N ) £ u(M)} and E(G M ) = {{ni,n 2 } | 3(ni,G, W), (n 2 ,^ 2 , 1V 2 ) £ 
u(M),(n 2 ,£ 2 ) £ Ni and {n\,£\) £ N 2 }. The labelling of Gm is inherited from 
the set M : for (n, £, N ) £ u(M), Xm(ti) = £. We will denote by Gm = (Gm, Am) 
the corresponding labelled graph. 

( l,n,N,M,a,H ) (l, n, N, M, a + 1 , H) 

5 5 : • >■ • 

This rule applies whenever V(n, N') £ M, {£' , N') ■< {£, 77i(iV)), Gm £ Sg , 

and V(n / , a') £ AT, a < a', and a < |V(G)| + 1. 

The sixth rule enables a node v to update its knowledge of the confidence 
level of one of its neighbour if the confidence level of this neighbour has increased. 

(Ji,ni,ATi,M,ai,ifi) (I2, ^2 , M, 0,2, H2) (li, n \ , N [ , M, a ± , (l2,ri2, N2, M, 0,2, H2) 

5 6 : • O > • O 

If a\ > 0, V(n 2 ,£' 2 , N 2 ) £ M, (£ 2 ,N 2 ) ^ (£ 2 , IIi(N 2 )), and there exists 

(n 2 ,a) £ iVi such that a 2 > a then iV{ := iVi \ {(n 2 ,a)} U {(n 2 ,a 2 )}. 

The rule ^7 enables a vertex u to change the value of its mailbox M whenever 
there exists a neighbour v' that used to have a confidence level a according to 
M such that a > a(v) — 1 and such that its mailbox has changed. If a vertex 
changes its mailbox, then it modifies also its history H(v), so as to remember 
its former confidence level. 

(li, ni, Ni, .Mi, ai, Hi) (12^2, N2, M2, ci2, H2) {li, ni, N[, M' , — 1 , Hi) (I2, ri2, N2, M2, C12, H2) 

5 7 : • O > • O 

If 3 (n,l,N) £ M 2 \ Mi and either ai = 0 or (ai > 0 and 3(n, N, Mi, a) £ 

H 2 ,3(n,a') £ N u a > a') then N[ := {(n',- 1 ) | 3 (n,o) £ W}, M' := 

Mi U M 2 and H[ := Hi U {(m, W, Mi, ai)}. 



4.2 Correctness of the Election Algorithm 

In the following (A Ni(v), Mi(v), at(v), Hi(v)) will stand for the label 
of the vertex v after the <th computation step of the election algorithm. The 
most important property of the algorithm is given in the following proposition. 
Roughly speaking it states that if the confidence level of vertex v is |E(G)| + 2 
then G contains a submersion of G m(v)- 
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Proposition 10. If there exists a vertex vo £ P(G) and a step io such that 
aio( v o) = |E(G)| + 2, G contains a submersion H of G M io {v 0 ) an( 1 f or every 
step i > io and for every vertex v £ V(H), Mi(v) = Mi 0 (v o). 

From Proposition 10 we deduce that if the conditions of Proposition 9 are 
satisfied then adding the following rule Sg allows to elect a unique vertex of G : S$ : 
the label (£, n, N, M, a, H ) such that n = max{n £ Cg m (G)} and a = |P(G)|+2 
is replaced by elected. The last two rules serve to propagate the information that 
there is an elected vertex: Sg allows to transform a label of a vertex with an 
elected neighbour to non- elected and 6>io propagates the non- elected label to all 
neighbours which are neither elected nor non- elected. 

Summarising we get: 

Theorem 11. There exists an election algorithm over a given graph G using 
cellular edge local computations if and only if the following conditions are satis- 
fied: 

1. for every H £ S G , C H ( G) ^ 0, 

2. there do not exist two disjoint subgraphs Gi and G 2 of G such that Gi 
(resp. G 2 ) is a submersion of a graph Hi £ S G (resp. H 2 £ S G ). 

5 Examples 

If we assume that nodes of a graph G have unique identifiers then G is a 
submersion-minimal graph and the knowledge of its size allows an election. 

5.1 Trees, Grids and Complete Graphs 

Consider an unlabelled tree T . Since we can colour each tree T with just two 
colours, if T has at least 2 vertices such colouring yields a submersion of T into 
the graph K 2 with two vertices and one edge between them. Such a submersion 
is non trivial if T has at least 3 vertices. Therefore for such trees there does not 
exist a naming algorithm using cellular edge local computations. 

If ipi : T — » K 2 is a submersion (colouring) of T then exchanging the two 
colours we get another submersion cp 2 and if T has at least three vertices then 
for each colour k £ V(K 2 ) at least one of the sets tpi(k), i = 1,2, has cardinality 
> 2. Consequently, the election problem cannot be solved for trees with more 
than 2 vertices. 

For the same reasons, square grids, which are also connected and colourable 
with two colours, do not admit either naming or election algorithms in our model. 

Complete graphs are submersion-minimal and therefore admit both naming 
and election in our model. 

5.2 Rings with a Prime Size 

First note the following fact: 
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Proposition 12. An unlabelled ring of size p is submersion-minimal if and only 
if p is prime. 

Therefore prime size rings allow both naming and election. This is a quite 
interesting corollary of our general conditions since our model is the weakest 
among graph relabelling systems, with the bare minimal synchronisation power. 
Moreover, contrary to some other algorithms on rings, our enumeration algo- 
rithm does not need any sense of direction for computing agents. 
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Abstract. In the context of graph transformation we look at the opera- 
tion of switching, which can be viewed as an elegant method for realizing 
global transformations of (group-labelled) graphs through local transfor- 
mations of the vertices. 

Various relatively efficient algorithms exist for deciding whether a graph 
can be switched so that it contains some other graph, the query graph, 
as an induced subgraph in case vertices are given an identity. However, 
when considering graphs up to isomorphism, we immediately run into 
the graph isomorphism problem for which no efficient solution is known. 
Surprisingly enough however, in some cases the decision process can be 
simplified by transforming the query graph into a “smaller” graph with- 
out changing the answer. The main lesson learned is that the size of the 
query graph is not the dominating factor, but its cycle rank. 

Although a number of our results hold specifically for undirected, un- 
labelled graphs, we propose a more general framework and give some 
preliminary results for more general cases, where the graphs are labelled 
with elements of a group. 



1 Introduction 

The material in this paper is motivated by a quest for techniques which enable 
the analysis of certain networks of processors. Our starting point is that the 
vertices of a directed graph can be interpreted as processors in a network and 
the edges can be interpreted as the channels/connections between them, labelled 
with values from some (structured) set, call it A, to capture the current state. 
The dynamics of such a network lies in the ability to change the labellings of the 
graph which is done by operations performed by the processors. A major aspect 
of the model here presented is that if a processor performs an input action, it 
influences the labellings of all incoming edges in the same way; the same holds 
the output actions which govern the outgoing edges. In other words, we have no 
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separate control over each edge, only over each processor. On the other hand, 
actions done by different processors should not interfere with each other, making 
this model an asynchronous one. 

Ehrenfeucht and Rozenberg set forth in [3] a number of axioms they thought 
should hold for such a network of processors. 

Al Any two input (output) actions can be combined into one single input (out- 
put) action. 

A2 For any pair of elements a,b € A , there is an input action that changes a 
into b; the same holds for output actions. 

A3 For any channel the order of applying an input action to i and an 

output action to j is irrevelant. 

Although each processor i was to have a set of output actions fii and a 
set of input actions A), in [3] (see also [2]) it was derived that under these 
axioms the input (output) actions of every vertex are the same and form a 
group. Also, the sets of input and output actions coincide, but an action will act 
differently on incoming and outgoing edges, as evidenced by the asymmetry in 
(2) in Section 2. The difference is made explicit by an anti-involution <5, which is 
an anti-automorphism of order at most two on the group of actions. The notion 
of anti-involution generalizes that of group inversion. The result of this will be 
that if a channel between processors i and j is labelled with a, then the channel 
from j to i will be labelled with 6(a). The model generalizes the gain graphs of 
[9] and the voltage graphs of [4]. 

As we shall see later the graphs labelled with elements from a fixed group A 
(and under some fixed anti-involution of that group), called skew gain graphs in 
the following, are partitioned into equivalence classes. These equivalence classes 
capture the possible outcomes of performing actions in the vertices, i.e., the 
states of the system reachable from a certain “initial” state. The transformation 
from one skew gain graph to another, is governed by selecting in each vertex an 
operation, which corresponds to an element of the group. Although the equiv- 
alence classes themselves are usually considered static objects, it is not hard to 
see that there is also a notion of change or dynamics: applying a selector to a 
skew gain graph yields a new skew gain graph on the same underlying network 
of processors, but possibly with different labels. For this reason the equivalence 
classes were called dynamic labelled 2-structures in [3] . 

Consider now the problem where we have a (target) skew gain graph h which 
represents our network, and a skew gain graph g , the query graph, which repre- 
sents a fragment of a network which to us has a special meaning, for instance, it 
describes a deadlock situation. A question to ask is then: is there a way to trans- 
form h by applying a selector, such that in the result we can detect a subgraph 
similar to g ? In terms of the example: is there a possible state in the system, 
derivable from h which contains a deadlocked subgraph. If the embedding from 
g into h is known, then this can be (in many cases) efficiently solved by applying 
the results of Hage [5] . However, the large number of possible embeddings of g 
into h remains a problem. In fact, we quickly run into the Graph Isomorphism 
problem which does not have a known efficient solution. In this paper, we seek 
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to alleviate this problem by seeing how we might reduce the skew gain graph g 
to a different, simpler graph without changing the outcome, i.e. if the reduced 
graph can be embedded, then so can g. 

After introducing our notation for groups, skew gain graphs and switching 
classes thereof, we continue by formulating a general framework for reasoning 
about reductions between skew gain graphs, and give a some illustrative and even 
surprising examples of such reductions. In some cases they work irrespective of 
the group and involution. In the case of bridging on the other hand, where we can 
“shorten” the lengths of cycles in our query graph, they generally work only for 
certain groups. We give examples of these for the group Z 2 and for the group Z 3 . 
The correctness of these reductions follow from rather surprising combinatorial 
results. We then show how these results can be used to derive an algorithm for 
the embedding problem, showing that the complexity of the embedding problem 
depends on the cycle rank of the query graph and not on the number of vertices. 
Finally, we prove some impossibility results for bridging. 

2 Preliminaries 

In this paper, we use both elementary group theory and graph theory. In this 
section we establish notation, and introduce the concept of switching classes 
of gain graphs with skew gains. For more details on group theory we refer the 
reader to Rotman [7]. 

For a group r we denote its identity element by lr- Let r be a group. A 
function 5 : T T is an anti-involution , if it is an anti-automorphism of order 
at most two, that is, S is a bijection and for all x,y £ T, 5(xy) = S(y)5(x) and 
<j 2 (a:) = x. We write (r,8) for a group r with a given anti-involution 5. 

Define E 2 (V ) = {(«,i>) | u,v € v}, the set of nonreflexive, directed 

edges over V. We usually write uv for the edge (u,v), but note uv ^ vu. For an 
edge e = uv, the reverse of e is e ^ 1 = vu. 

We consider graphs G = (V,E) where the set of edges E C E 2 (V) satisfies 
the following symmetry condition: 

if e £ E then also e~ l £ E. 

Such graphs can be considered as undirected graphs where the edges have been 
given a two-way orientation. We use E(G) to denote the edges E of G and 
similarly V ( G ) to denote its vertices V. 

Two vertices v, v' £ V ( G ) are adjacent in G if (v, v') £ E(G). The degree of a 
vertex is the number of vertices in the graph it is adjacent to. A vertex of degree 
zero is called isolated, a leaf has degree one, a chain vertex degree two, and all 
other vertices are called dense vertices. 

A sequence of vertices p = (iq, . . . , Vk), k > 0, is a path in G if iy is adjacent 
to n - )-i for * = 1, . . . , k— 1 and all vertices are distinct. By E(p) we denote the set 
of edges {(i>i, v 2 ), • • • , (vk-i,Vk)}- Additionally, p is called a chain if all vertices 
v 2 , . . . , Vk - 1 are chain vertices. The chain p is maximal in G if the endpoints V\ 
and Vk are not chain vertices. A cut edge in a graph is an edge which is not on 
any cycle. 
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Let G = (V, E) be a graph and (F, J) a group with anti-involution. A pair 
(G, g) where g is a mapping g : E — > (F, 6 ) into the group r is called a (F, 6 ) -gain 
graph (on G) (or a graph with skew gains or a skew gain graph), if g satisfies the 
following reversibility condition 

g{e _1 ) = 5 {g{e)) for all e € E . ( 1 ) 

In the future we will refer to a skew gain graph (G, g) simply by g unless confusion 
arises. We adopt in a natural way some of the terminology of graph theory for 
graphs with skew gains. For instance, every path in G is also a path in g, and 
we can use E(g) to denote the set of edges of the underlying graph G. 

The class of (F, < 5 )-gain graphs on G will be denoted by Lg(F, 6 ) or simply 
by Lo More importantly, L(F, < 5 ) = |J{Lg(F, < 5 ) | G is a graph }. A gain graph 
is a ( r , _1 )-gain graph; these are also called inversive skew gain graphs. 

With a path p = (v\, . . . , Vk) in g £ L g(F, 6) we can associate the sequence 
of labels A (p) = (g(v 1V2), ■ ■ ■ , g(vk-iVk))- Now, p is an a-patlr if every value in 
A (p) is equal to a. Secondly, p is a 6-summing path for some 6 £ r if g(v\V2) • 
g{v 2V3) • ■ ■ g{vk-iVk) equals b. (We often denote this fact by writing g(p) = 6.) 
In other words, evaluating the product of values found along p using the group 
operation • of r evaluates to the group element 6. 

Let g £ Lg(T, S). A set X C V(G) is an a-clique if for all x, y £ X: x ^ y 
implies g(x,y ) = a. Also, for X,Y C V{G), X is said to be a-connected to Y, if 
Inh = 0 and g{x, y) = a for all x £ X,y £Y. 

A function cr : V — > r is called a selector. For each selector er we associate 
with g a (F, < 5 )-gain graph g a on G = (V, E) by letting, for each uv £ E, 

g a {uv ) = <j{u)g{uv)8{<j{v)) . (2) 



Example 1. 

To illustrate switching, consider g\ and 32, the (Z4, id)-gain graphs of Figure 1 (a) 
and (b) respectively (the group Z4 is the group of addition modulo 4 ; the invo- 
lution is the identity function giving rise to a symmetric graph). The second of 
these, gn, can obtained from g\ by applying the selector a that maps both 1 and 3 
to 3 , and both 2 and 4 to 1 . For example, the label of the edge ( 1 , 3 ) is computed 




Fig. 1. Two elements of Lg(Z 4 , id). 
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as follows: < 72 ( 1 , 3) = < 7 ^ (1,3) = cr(l)< 7 i (1, 3)<5(cr(3)) = 3 + l + <5(3) = 3 + 1 + 3 = 3, 
where + is of course addition modulo 4. The path p = (1,3,4, 1) is 3-summing 
in gi (here \(p) equals (1,0,2)) and 1-summing in g 2 (here A (p) equals (3,0,2)). 
Neither are 0-paths, while (1,2, 3, 4) is a 0-patlr in both g\ and g 2 . 

The class [ 5 ] C L g(T, 5) defined by 

\g\ = {g° | <7 : V - T} 

is called the switching class generated by g. 

It is not difficult to prove that a switching class is an equivalence class of 
skew gain graphs. The underlying equivalence relation on L a(E, 5) is that g = g' 
for g,g' G Lg(T, <5) if and only if 3a : V(G) — > T such that g' = g a . Obviously 
g = g and if gi = g 2 then also g 2 = gi, because gf = g 2 if and only if gi = g 2 , 
where the a~ x is such that ct - 1 (i>) = er(i >) _1 for all v £ V. 

Closure under composition of selectors is something that we would expect in 
our model: it is a consequence of Axiom A1 of the introduction. If we define the 
composition of two selectors a and r to be ar{v) = a(v)r(v), then we can prove 
that for each g G L g(T, 5) and selectors tr, r, g aT = (g T ) a . 

If the group T is the cyclic group of order 2, Z 2 , then by necessity the 
involution is the identity function and the skew gain graphs are exactly the 
undirected simple graphs of, e.g., [1, 6 ]. Directed graphs are obtained by choosing 
r = Z4 and we take the involution S to be the group inversion. 



3 The General Framework 

In the following let r be a fixed, but arbitrary group and 5 a fixed, but arbitrary 
involution of T. 

Let g G Lg(T, 5) and h G L n(r,S) be skew gain graphs. An injection 

-0 

if : V ( G ) — > V (H) embeds g into h, denoted by g h, if 

g{uv) = h(if(u)if(v)) for all uv G E(G). 

If we do not care what if is, we write g h instead. Note that in some definitions 
of embedding there is also an injection on the labels, but since our application 
attaches meaning to the labels, we do not allow that here. 

V' i/» — 1 

The embedding if is an isomorphism from g to h if g h and h g. We 

tp ip — 1 

denote this fact by g = h, or, equivalently, h = g. 

The definition of embedding can be extended to switching classes in a natural 
way: 

g [h] if and only if there exists h! G [h] such that g h! . 

In this and the following sections, the central problem is to decide whether the 
query skew gain graph g G Lg(T, S) can be embedded in a switch of the target 
skew gain graph h£L H (r,s). 
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We assume for the remainder of the paper that the target skew gain graph 
is total, meaning that H = (V, E^(V)) for some set of vertices V. 

We now come to the definitions central to this paper. We are interested in 
establishing for a certain query graph g into which other skew gain graph g' 
it may be transformed so that the ability of embedding g into h is preserved 
and reflected into g' . More formally, we define R(r, 5 ) as the set of embedding 
equivalent pairs (<?,</) £ L (r,S) x L(T, S) such that 

V/i : g ^ [h] g' ^ [h\. 

Note that in our definition we have left the embedding itself unspecified, meaning 
that in general we do not care whether g and g' are embedded “in the same 
place”. It also implies that g and g' may have different underlying graphs. 

Although we have just defined the largest possible (equivalence) relation 
relating skew gain graphs from L(T, <5) to each other, it does not give us any 
concrete information which pairs are actually in the relation for a given group 
and involution. In the remainder of this paper we shall establish a number of 
results which either show that some pairs are definitely in this relation, or that 
some pairs can never be. 

Let R be any equivalence relation on L (r,5). R is an embedding invariant 
relation (emir) if ( g,g ') € R implies ( g,g ') £ lZ(r,8)- 

We now give some examples of emirs that occur in the literaure. The following 
easy lemma shows that for embedding the identities of the vertices of the query 
graph are unimportant. 

Lemma 1. 

For two isomorphic (T, 5) -gain graphs g and g' ( with isomorphism (j) from g to 
g' '): if g h, then g' h. 

The second example is that embedding a query graph g is the same as em- 
bedding one of its switches: 

Lemma 2. 

0 0 

If g ^ [h], then also g a [h] for any selector o : V(g) — > T. 

Note that Lemma 1 implies the existence of an emir Rm' ( g,g ') £ Rm if and 
only if g = g' . Another example comes from Lemma 2 where it is proved that in 
fact = is an emir. 

We shall now give a slightly more complicated example. 

Define I?dcr such that ( g , <?') € -Rdcr if g' can be obtained from g by 
removing any number of cut edges of g. The symmetric closure of this relation, 
Ren, is an equivalence relation on (T, 5)-gain graphs. So any two g and g' are 
related if and only if they have exactly the same cycles and the same domain. 
A basic result from the theory of switching classes proves that this relation is in 
fact an emir (see for instance [8]): 

Theorem 1. 

Let H be a graph and let T be a subgraph of H that is a forest. For every 
(r, 5) -gain graph g on T and every h on H: g [/i]. 
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The proposition states that any acyclic structure can be embedded in a 
switching class. Note that by removing edges we do not change the size of the 
domain of the (T, 5)-gain graph; this is necessary for establishing embedding 
invariance. 

To combine two emirs into one we can use the join operation: for two emirs 
R and R' on ( r,S ), the join of R and R' , denoted by R V R' , is the smallest 
equivalence relation including both R and R' . 

Lemma 3. 

If R and R' are emirs, then the join of R and R' is an emir. 

The join can be used to combine various emirs into a larger one. For instance, 
joining an emir such as i?DCR with yields an emir that “incorporates” re- 
moving of cut edges and isomorphisms. In such a way we can define various emirs 
and compose these to come as close as possible to the largest of emirs, 1Z(r,8)- 

4 Bridging 

In this and the coming sections we assume that the group E is abelian and that 
the involution 6 is the group inversion , we will denote the identity of the 
group simply by 0. 

The reason for the restriction is the Cyclic Sum Invariance, which holds for 
switching classes with abelian groups and involution equal to the group inversion. 
It means that if one takes any cycle and computes the sum along that cycle for 
the labels on that path, then this value does not change when the skew gain 
graph is switched [2]. The gain graphs g \ and g 2 of Example 1 show that in 
general this result does not hold, because the cycle (1,3,4, 1) sums to different 
values in each of them, even though they are in the same switching class. 

Let g , g' G L(T, S) be such that g contains a 0-clrain p = (xo, . . . , x k ). Then, 
for integers k and I with k > l, g' is a (k, £) -bridging of g , denoted gB^g ' , if g' 
is a (.T, (i)-gain graph on V (g) with 

E(g') = ( E(g ) - E(p)) U E(p') for p’ = (x 0 , . . . , xe-i, x k ) 

and 

= f 9(e), if e G E(g) - E(p) 

J (0, otherwise 

We additionally define to be equal to (Uf.) 1 for k < t. 

For the following two (T, 5)-gain graphs g (left) and g' (right) it holds that 
gB^g 1 . In this case p = (xq, . . . , X 5 ): 
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Note that we can assume that the chain is labelled with zeroes, since if it 
does not we can always switch it so that it does. Also, note that the definition 
also would have allowed to exclude X 4 and x§ by choosing {x\, . . . ,x§) as our 
0-clrain. 

In what follows we are interested in determining for which groups we can 
always (i.e., for any (T, <5)-gain graph g £ L(T, 5)) change bridges of length k to 
bridges of length £. For this we introduce the following relation Rr C N x N, 
where (k,£) £ Rp if and only if Bf is an emir on L(F, 5). Obviously, for any 
group r it holds that (k, k) £ Rp where k > 0. 

The following lemma couples the concept of bridging to something we can 
more easily verify. Implicitly we allow the embedding only to be changed on the 
chain vertices that occur on the bridge. 

A (r, (i)-gain graph on {0, . . . , n} for some n is an (n, k)-bridge structure if 
it has a 0-path (0, . . . , k). 

The following lemma shows that to decide whether we can bridge paths of 
length k into t, we can look at total skew gain graphs which have a 0-labelled 
path (0, . . . , k) and show that whatever labels are on the other edges, we can 
always find a 0-summing path from 0 to k of length £. 

Lemma 4. 

Let k and i be natural numbers, and let n = ma x(k,£). It holds that (k,I) £ Rr 
if and only if for every (n, k)-bridge structure g there is a 0 -summing path p in 
g of length i from 0 to k. 

Proof. For the if-part we need only note that we can replace a path of length k 
by one of length i which has the same starting and end-point without changing 
the sum on any of the cycles of which these vertices are part. Because a bridge 
is part of a chain, every vertex on the bridge is part of exactly the same cycles. 

The only if part follows from the fact if we cannot replace the path of length 
n by a path of the same sum of length k, then we change the cyclic sum along 
at least one cycle, which contradicts the Cyclic Sum Invariance. 

Theorem 2. 

For natural numbers k\ > 1 and &2 > 2: (k\, 1) ^ Rr and (£;2,2) ^ Rr, if T is 
not the trivial group, {0}. 

Proof. Let a £ T with o / 0. Let gi be a (P, <5)-gain graph on {0, . . . , £ 2 } such 
that for 1 < i < fc 2 — 1, 32 ( 0 , i) = 0, < 72 ( 2 , ^ 2 ) = a and all other edges are labelled 
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arbitrarily. Hence for all i, <72(0, i, fe) = a / 0. The same kind of reasoning can 
be applied to the other case. 

Example 2. 

If we know that (5, 3) £ Rr, then it is easy to see that (fc, k — ( 5 — 3)) = 
( k , k — 2) £ R.p as long as k — 2 > 3: if g contains a chain of length greater than 
5, then we can take any part of this chain of length 5 and reduce it to 3 and 
thereby reduce the length of the entire chain from k to k — 2. We can repeat this 
process until the chain is not sufficiently long anymore. We conclude that if we 
prove that (5, 3) £ Rr then (k, k — 2) £ Rr for k > 5 and even ( k , k — 2£) for 
k — 2£ > 3. Using similar reasoning we conclude that (3,5) £ Rr implies that 
( k , k + 2£) for k > 3. o 

In general we have 

Lemma 5. 

If (k\,£i) £ Rr then (^2,^2) £ Rr where £2 = — {k\ — t\ )m, in > 1 and 

£2>h- 

If r = Tj x r 2 then ( k ,£ ) £ Rr implies (k,£) £ Rr, (i = 1,2), but not vice 
versa, not even if Tj = T 2 (see Theorem 4). The positive result is easy, because 
the identity of T maps to the identities of the factors. Hence the 0-summing paths 
stay 0-summing in the projection. The following result says that if a bridging is 
not possible for a given group, it automatically precludes bridging in groups of 
which it is a factor. 

Lemma 6. 

If r is a group such that (k,£) (j Rr, then this also holds for all groups of which 
r is a factor. 



4.1 The Case for Z 2 

In view of Theorem 2 it may be surprising that bridgings do exist. 

Lemma 7. 

(5,3 )gR Z2 

Proof. Let 6 be a (5,5)-bridge structure (recall that the path (0,...,5) is a 0- 
path, and the other edges are arbitrarily labelled by elements of Z2). Now, if 
6(0,3) = 0, then 6(0, 3, 4, 5) = 0. The same reasoning applies to (2,5). In the 
other cases, 6(0, 3) = 1 = 6(2, 5) and 6(0, 3, 2, 5) = 0. 

Knowing that bridging is possible under Z2 we can now illustrate that the 
necessity of the target skew gain graph being total: take a cycle on six vertices 
- call it c. We can bridge c into d which consists of two isolated vertices and a 
cycle on four vertices. Obviously, c <— > [c], but d yb [c] . The reason is that the 
target graph, in this case c, does not have all its edges present. 
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Lemma 8. 

( k , £) £ R^ 2 if k and £ are of opposite parity. 

Proof. Let k and £ be of opposite parity. We may assume £, k > 3, because of 
Theorem 2. 

Let n = max(fc, £), 6 be a (n, fc)-bridge structure and V = V(b). By Lemma 4, 
we only need to exhibit one such structure which has no path of length £ from 0 
to k which sums to 0. For that, choose b such that the sets K C V and V — K 
are 0-connected 1-cliques. Here, K = {x | 0 < x < k, x even}. Note that there is 
a 0-path (0, 1 , ... , k). 

We are interested in paths of length £ which go from 0 to k and sum to 0. If k 
is even, then £ is odd, and the path is one that starts in K and ends in K. Since 
we must switch from I\ to V — K an even number of times, we traverse an odd 
number of edges within either K and V — K. Since these edges each contribute 
1 to the sum, and they are the only edges which contribute, the sum along the 
path equals 1. If k is odd and hence the path starts in K and ends in V — K 
similar reasoning leads to a sum of 1 . 

Theorem 2 and Lemmas 5, 7 and 8 lead to the following. 

Corollary 1. 

If k > £ > 2 then ( k , £) G R ^ if and only if k and £ have the same parity. Also, 
( k , £) € Rz 2 f or 1 ^ — 2 if and only if k = £. 

4.2 The Case for Z3 

The following result was quite a surprise. 

Lemma 9. 

(6,4) G R Zs and (6,5) G R Zs , but (5,4) ^ R^. 

Proof. The positive results have been obtained by a computer check of all paths 
of length 4 and 5, respectively, from 0 to 6 in a (6, 6)-bridge structure. 

The counterexample for (5, 4) is given in the following figure, where the solid 
edges are labelled with 0 and the dashed edges (in the direction of the arrow) 
with 1. The reader may verify that indeed no path from 0 to 6 of length 5 sums 




5 An Algorithm for Checking Configuration Containment 

In this section we use the fact that (5, 3) G i?2 2 (Lemma 7) to derive an algorithm 
for checking that g [h] for g,h G L(Z2 , _1 ) where h is total. The result is 
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mainly based on the following graph theoretical argument which shows that if 
we consider graphs that do not have any isolated vertex or leaves, and every 
chain contains a bounded number of chain vertices, then the number of vertices 
in the graph can be bounded by a constant multiple of the cycle rank of the 
graph. The cycle rank of a graph G is defined as the size of its cycle base, and 
equals e — n + k, where n = |V(G)|, e = \E(G)\ and k is the number of connected 
components of G. 

Lemma 10. Let G = (V, E) be a connected graph which has only vertices of 
degree at least two and at least one dense vertex. If every maximal chain in G 
has at most c > 0 chain vertices, then |V(G)| < 2 c£, where £ is the cycle rank 
of G. 

Proof. We first make an estimation for graphs which only contain dense vertices. 
Let dc(v) denote the degree of the vertex v of G. Then, by the handshaking 
lemma of graph theory, 2e = ^2 ve y > 3n, since dc{v ) > 3 for all v. Hence 
f = e — n + 1 > 3n/2 — n + 1 = n/2 + 1, so that n < 2£ as required. Now, any 
edge between two dense vertices can be replaced by a chain of a most c chain 
vertices, which adds to n and e in equal amounts, so that n < 2 cf. 

Lemma 11. 

Let g £ L(Z 2 ,~ 1 ) and let f be the cycle rank of g. Then, there exists a g' 
embedding equivalent with g such that ni(g') < 6£, where ni(g') is the number of 
non-isolated vertices of g' . 

Proof. Remove cut-edges, isolated vertices and use the (5,3) bridging to change 
g into g' , which has the property that it consists only of a number of isolated 
vertices, dense and chain vertices. Neither of these operations change the cycle 
rank of g. Now apply Lemma 10 to each of the components of the graph (the cycle 
rank of a disconnected graph equals the sum of the cycle rank of its components) 
to obtain the given bound for the number of chain and dense vertices, where we 
use Lemma 7 to limit the number of chain vertices in any chain to 3. We omit 
in this reasoning components which are simple cycles: connected graphs which 
have only chain vertices. These, however, can all be reduced to cycles of length 
at most six, again using Lemma 7. 

Finally, we can formulate a bound on the time complexity of the embedding 
problem for Z 2 as follows: 

Theorem 3. 

Let g,h £ L(Z 2 ,~ 1 ) where h is total, n = |V(/i)| and f is the cycle rank of g. It 
can be decided in 0(nf‘ +2 ) time whether g [. h ]. 

Proof. After checking that |V(</)| < h, we can find an embedding equivalent g' 
such that ni(</) < 6£ through Lemma 11. Now, we actually remove the isolated 
vertices from g' (we have already checked that g does not have more vertices than 
h does). The number of possible injections from g' into h is bounded by n 6 ^, for 
each of which we have to do at most 0(n 2 ) work to see if under the injection, we 
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can switch h so that it contains g' (using the results of [5]). The preprocessing 
of g, which consists of removing leaves, isolated vertices and shortening chains, 
can easily be done in time 0(n 2 ). 

6 Some Impossibility Results 

In this section we are interested in determining, given a natural number £, for 
which finitely generated abelian group r it holds for every k > £, that (k,£) (j 
Rr ■ In Lemma 2 we found two such examples, £ = 1 and £ = 2, in which case 
impossibility was obtained for all groups. Since we have already treated the 
cases for £ < 2, we assume £ > 3, and hence k > 3. From Lemma 6 and the 
fundamental result on finitely generated abelian groups, it follows that we can 
restrict ourselves to solving this question for the cyclic groups (of order a prime 
power) and Z. 

Since we are interested in proving the impossibility of bridging, we have to 
show that we can always find (fc, /c)-bridging structures in which there is no 
0-summing path of length £ from 0 to k. 

First we investigate which edges in the bridge structures must be labelled 
with a non-identity element. These are exactly the edges that are on a path of 
length £ from 0 to A; which traverse only edges on the path (0, . . . , k), except 
for one edge which has an undetermined label. We observe that these edges are 
those of the form 



(i, i + (k — £ + 1)), i = Q,...,£—l. (3) 

We shall next prove that the only bridging (fc, 3) for k > 3 occurs if the group 
is trivial or the group is Z 2 . The main technique used here is to generate a family 
of skew gain graphs, depending on fc, which contains a large 0-clique, and only 
relatively few other edges. Parts of the paths in the 0-clique contribute nothing 
to the sum along a path, so only the values on the other edges really matter. 
To simplify the proof any vertex outside X is connected in a uniform way to all 
vertices in X and (by reversibility) the other way around. In 2-structures jargon, 
such a set X is called a clan (see [2] ) . The next theorem is a typical example of 
this kind and can be viewed as an illustration of the proof technique. 

Theorem 4. 

If for k > 3, (k, 3) € Rr for a finitely generated abelian group, then r is either 
Z -2 or the trivial group. 

Proof. Like in Lemma 2 the idea is to find a (fc, &)-bridge structure which does 
not exhibit a 0-summing path of length £ = 3 from 0 to k. Because of Lemma 6 
and the fundamental theorem on finitely generated abelian groups, we start by 
considering the cyclic groups of order larger than two and the group Z of integer 
addition. 

Consider the following graph in which all edges whose value is as yet unknown 
are labelled with a variable label aj for some i, and the vertex X represents a 
0-clique on k — £ vertices. 
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By (3), ao,ai and 02 should be labelled by values different from 0. It is easily 
seen that also 04 ^ 0 (for paths through X). We also find that ao 7 ^ at* -1 , 
because of the path (0, k — 2, x, k) where x £ X. In fact if we set a 3 = 0, a 0 , <22 
and 04 to the generator of the group, 1 , and 03 to l -1 there is no path of length 
£ = 3 which sums to 0. It is important to note that since the group has order at 
least three, 1 7 ^ l -1 . 

Since a (k, 3) bridging existed for Z 2 , we should also show that such a bridging 
is not possible for Z 2 x Z 2 . Taking the same graph as our starting point, we choose 
03 the identity (0, 0) and set ao = 04 = 02 = (0, 1) and 04 = (1, 0). Again, the 
reader can verify (there are only a finite number of cases), that no path of length 
3 from 0 to k sums to (0,0). 

7 Conclusions and Future Work 

Taking the model of Elrrenfeucht and Rozenberg as our starting point, we have 
considered the embedding problem in detail. We have set up a framework to 
establish results about reducing query skew gain graphs to smaller ones and 
proved some general results in this matter. Then we concentrated on bridging, 
which, for Z 2 at least, results in an algorithm for the embedding problem which is 
dominated not by the size of the query graph, but by its cycle rank, corresponding 
to the general intuition in switching classes that cycles make life harder. 

We have not completed a full investigation of all possible bridgings for all 
possible finitely generated abelian groups, although we have the full picture for 
Z 2 and Z 3 . We do conjecture that for every such group T there is for every £ 
a k such that (k,£) ^ Rr ■ Note by the way, that bridging is just one possible 
reduction strategy and others might exist. In that sense, the research in this area 
is still very much open, especially for non-abelian groups where bridging is not 
an option. 
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Abstract. A synchronizer is intended to allow synchronous algorithms 
to be executed on asynchronous networks. It is useful because designing 
synchronous algorithms is generally much easier than designing asyn- 
chronous ones. In this paper, we provide synchronization protocols de- 
scribed as local computations. We obtain a general and an unified ap- 
proach for handling synchrony in the framework of local computations. 



1 Introduction 

Distributed algorithms are studied under various models. One fundamental cri- 
terium is synchronous or asynchronous [1,3, 7, 11]. In a synchronous model, we 
assume that there is a global clock and that the operations of components take 
place simultaneously: at each clock tick an action takes place on each process. 
This is an ideal timing model that does not represent what really happens in 
most distributed systems. In fact, synchronization in networks may be viewed as 
a control structure which enables to control relative steps of different processes; 
it may be illustrated by the following examples: 

1. the processes execute actions in lock-steps called pulses or rounds. In a pulse, 
a process p executes the following sequence of discrete steps: 

(a) p sends a message, 

(b) p receives some messages 

(c) p performs local computations; 

2. another assumption is that computation events of pulse p appear after com- 
putation events of pulse p — 1 and all messages sent in pulse p are delivered 
before computations events of pulse p + 1; 

3. if each process is equipped with a counter for local computations, we may 
assume that the difference between two counters is at most 1; and more gen- 
erally, for a given non negative integer k we may assume that the difference 
between any two counters is at most k; 

4. some algorithms need some synchronization barriers applied to a group of 
processes, it means that all group members are blocked until all processes of 
the group have reached this barrier. 

In the asynchronous model there is no global clock, separate components 
take steps at arbitrary relative speeds. It is assumed that messages are delivered, 
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processes perform local computations and send messages, but no assumption is 
made about how long it may take. There exist also intermediate models like 
models assuming the knowledge of bounds on the relative speeds of processes or 
links. This paper presents several methods for the simulation of synchrony on 
asynchronous distributed systems by means of local computations. 

1.1 The Model 

We consider networks of processes with arbitrary topology. A network is rep- 
resented as a connected, undirected graph where vertices denote processes and 
edges denote direct communication links. Labels are attached to vertices and 
edges. The identities of the vertices, a distinguished vertex, the number of pro- 
cesses, the diameter of the graph or the topology are examples of labels attached 
to vertices; weights, marks for encoding a spanning tree or the sense of direction 
are examples of labels attached to edges. 

Labels are modified locally, that is in general, on star graphs or on edges 
of the given graph, according to certain rules depending on the subgraph only 
( local computations). The relabelling is performed until no more transformation 
is possible, i.e., until a normal form is obtained. 

The model of local computations has several interests: 

— it gives an abstract model to think about some problems in the field of 
distributed computing independently of the wide variety of models used to 
represent distributed systems [5], 

— it is easier to understand and to explain problems, to compute their solutions 
or to obtain results of impossibility, 

— any positive solution in this model may guide the research of a solution in 
a weaker model or be implemented in a weaker model using randomized 
algorithms, 

— this model gives nice properties and examples using classical combinatorial 
material. 



1.2 Synchronizers 

A synchronous distributed system is organized as a sequence of pulses: in a 
pulse each process performs a local computation. In an asynchronous system the 
speed of processes can vary, there is no bounded delay between consecutive steps 
of a process. A synchronizer is a mechanism that transforms an algorithm for 
synchronous systems into an algorithm for asynchronous systems. 

As the non-determinism in synchronous systems is weaker, in general, algo- 
rithms for synchronous systems are easier to design and to analyze than those 
for asynchronous networks. In asynchronous systems, it is difficult to deal with 
the absence of a global synchronization of processes. Consequently, it is useful 
to have a general method to transform an algorithm for synchronous networks 
into an algorithm for asynchronous networks. Therefore, it becomes possible to 
design a synchronous algorithm, test it and analyze it and then use the standard 
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method to implement it on an asynchronous network. A synchronizer operates 
by generating a sequence of local clock pulses at each process. An introduction 
and the main results about synchronizers may be found in [1, 3, 7, 9, 11]. 

1.3 The Main Results 

In this paper we focus on efficient synchronizers where the aim is to reduce 
the variation of pulses between the processes of the network. We discuss several 
synchronizers according to the assumptions made on the network. We consider 
mainly three classes of networks. The first class consists of networks whose sizes 
are known. That is, each process has a local knowledge containing the size of 
the network. For this class of networks, we present two types of synchronizers 
based on two different techniques. The first one uses the SSP algorithm [12] that 
is commonly used to detect stable properties. The second one is a randomized 
procedure based on the use of random walks on graph. The final class of networks 
we deal with is the class of tree-shaped networks. In fact, if the network is a tree, 
the synchronization protocol begins by electing a vertex which will afterwards 
ensure the synchronization of the whole network. 

1.4 Summary 

The paper is organized as follows. Section 2 recalls briefly several definitions of 
local computations and introduces their use to describe distributed algorithms. 
Section 3 presents important properties of synchronizers. A simple synchronizer 
is given in Section 4. Section 5 deals with synchronizers for networks with known 
size and Section 6 presents a synchronizer for trees. A general method for 
building and using synchronizers is discussed in Section 7. 

2 Definitions and Notations 

2.1 Undirected Graphs 

We only consider finite, undirected and connected graphs without multiple edges 
and self-loops. If G is a graph, then V(G) denotes the set of vertices and E(G ) 
denotes the set of edges; two vertices u and v are said to be adjacent if {u,v} 
belongs to E(G). The distance between two vertices u,v is denoted d(u,v). 

Let v be a vertex and k a non negative integer, we denote by Bc{v,k), or 
briefly B(v, k), the centered ball of radius k with center v, i.e., the subgraph of 
G defined by the vertex set V' = {V € V(G) | d(v,v') < k} and the edge set 
E' = {{v\,V 2 } € E{G) | d(v,v i) < k and d(v, V 2 ) < k}. 

2.2 Graph Relabelling Systems 

for Encoding Distributed Computation 

In this subsection, we illustrate, in an intuitive way, the notion of graph rela- 
belling systems by showing how some algorithms on networks of processes may 
be encoded within this framework [6]. As usual, such a network is represented 
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by a graph whose vertices stand for processes and edges for (bidirectional) links 
between processes. At every time, each vertex and each edge is in some particular 
state and this state will be encoded by a vertex or edge label. According to its 
own state and to the states of its neighbours (or a neighbour), each vertex may 
decide to realize an elementary computation step. After this step, the states of 
this vertex, of its neighbours and of the corresponding edges may have changed 
according to some specific computation rules. Let us recall that graph relabelling 
systems satisfy the following requirements: 

(Cl) they do not change the underlying graph but only the labelling of its 
components (edges and/or vertices), the final labelling being the result, 
(C2) they are local, that is, each relabelling changes only a connected subgraph 
of a fixed size in the underlying graph, 

(C3) they are locally generated, that is, the applicability condition of the rela- 
belling only depends on the local context of the relabelled subgraph. 

A precise description and definition of local computations can be found in 
[4], We recall here only the description of local computations, we explain the 
convention under which we will describe graph relabelling systems later. If the 
number of rules is finite then we will describe all rules by their preconditions 
and relabellings. We will also describe a family of rules by a generic rule scheme 
(“meta-rule”). In this case, we will consider a generic star-graph of generic cen- 
ter vo and of generic set of vertices B(v o,l). If X(v) is the label of v in the 
precondition, then A^u) will be its label in the relabelling. We will omit in the 
description labels that are not modified by the rule. This means that if A(zj) is 
a label such that X'(v ) is not explicitly described in the rule for a given v, then 
A' (ii) = A(w). In all the examples of graph relabelling systems that we consider 
in this paper the edge labels are never changed. 

An Example 

The following relabelling system performs the election algorithm in the family 
of tree-shaped network. The set of labels is L = {N, elected, non-elected}. The 
initial label on all vertices is lo = N. 

R1 : Pruning rule 

Precondition : 

• A(u 0 ) = N, 

• 3! v £ B(v o, 1), v yf vo, X(v) = N. 

Relabelling : 

• X\vo ) := non-elected. 

R2 : Election rule 

Precondition : 

• A(«o) = N, 

• \/v £ B(v o, 1), v yf v 0 , X(v) yf N. 

Relabelling : 

• X\vq) := elected. 
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Let us call a pendant vertex any vertex labelled N having exactly one neigh- 
bour with the label N. There are two meta-rules Rl and R2. The meta-rule R1 
consists in cutting a pendant vertex by giving it the label non-elected. The label 
N of a vertex v becomes elected by the meta-rule R2 if the vertex v has no 
neighbour labelled N. A complete proof of this system may be found in [6] . 



3 Synchronizer Properties 

Throughout the rest of this work, several distributed synchronization protocols 
will be described. 

The operations of processes take place in a sequence of discrete steps called 
pulses: we represent a pulse by a counter, then we associate to each process a 
pulse number which is initialized to 0 or 1. At each step, a process goes from 
pulse i to pulse i + 1. 

All of these protocols involve synchronizing the system at every synchronous 
round. This is necessary because the protocols are designed to work for arbitrary 
synchronous algorithms. All the synchronizers we will build are “global”, in the 
sense that they involve synchronization among arbitrary nodes in the whole 
network. To preserve this “global” synchronization, each synchronizer has to 
satisfy some properties. The essential property we seek to preserve in translating 
a generic synchronous algorithm A s into an asynchronous algorithm A as is that 
the pulse difference between two arbitrary nodes is at most 1 ( Main Theorem ). 
In order to ensure that this property holds for all nodes and at any time or 
pulse, we begin by requiring pulse compatibility in the network. This means that 
a node can only increase its pulse, when it is sure that there is no node in the 
network that is still in a lower pulse. This property is guaranteed by the validity 
of Theorem 2. Furthermore, we will strengthen our synchronization assumption 
by forcing pulse convergence at any time. By pulse convergence we mean the fact 
that all the vertices of a network have simultaneously to be in pulse n before any 
node starts the pulse n + 1. Pulse convergence is stated by Theorem 1. Thus, 
the correctness of Theorem 3 ( Main Theorem) can be deduced from the pulse 
compatibility and the pulse convergence. 

The correctness of our synchronizers will then depend on the validity of the 
following three theorems: 

Theorem 1 (Convergence Theorem). Let n (n > 0) be the maximum pulse 
that has been reached so far. After a finite number of steps T n > 0, all the vertices 
of G are in the same pulse tt. 

Theorem 2 (Pulse compatibility Theorem). A vertex u in G changes its 
pulse p(u) only when there is no node v in G such that and p(v) < p(u). 

Theorem 3 (Main Theorem). At any time t, the pidse difference between 
two vertices v and u of a network G is at most 1. 
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4 A Simple Synchronizer 

We recall here the synchronization as presented and used in [10]. On each vertex 
v of a graph there is a counter p(v), the initial value of p( v) is 0. At each step the 
value of the counter p(v o) depends on the value of the counters of the neighbours 
of Vo more precisely if p(v o) = i and if for each neighbour v of Vq p(v) — i or 
p(v) = i + 1 then i>o is considered as safe and the new value of p(v o) is i + 1. 

R1 : The synchronization rule 

Precondition : 

• p(v o) = i, 

• \/v £ B(v o, 1) p(v) = i or p{v) = i + 1. 

Relabelling : 

• p(v 0 ) :=* + !• 



A simple induction on the distance between vertices proves that: 
Proposition 1. For all vertices v\ and vi, \p(v\) — p(v 2 )\ < d(v 1 ,^ 2 ). 



Remark 1. This synchronization does not need any knowledge on the graph, in 
particular on its size. 

To implement this synchronization, a counter modulo 3 is sufficient: each 
process needs to compare the value of its counter to the value of each neighbour. 
More precisely, for each process vo and for each neighbour v of vq we determine if : 
p{v 0 ) = p{v) — 1, or p(vo) = p(v), or p(vo) = p(v) + 1. Finally, the synchronization 
may encoded by: 

R1 : The synchronization rule 

Precondition : 

• p(v 0 ) = i, 

• Vu € B(v 0 , 1) p(v) = i mod 3 or p(v) = (i + 1) mod 3. 

Relabelling : 

• p(v 0 ) := (i + 1) mod 3. 

The a synchronizer [11] is similar to the synchronizer presented in this sec- 
tion. In fact, the precondition of rule R\ expresses that the vertex vq is safe. In 
contrast to the a synchronizer, a vertex Vq generates the next pulse as soon as 
it is safe. It does not wait until all its neighbours become safe. This synchro- 
nizer does not guarantee the pulse compatibility theorem and the Convergence 
theorem. Thus, it does not satisfy Theorem 3. It is therefore not appropriated 
to perform a “global” synchronization in a network. The next section will be 
devoted to the presentation of synchronizers that are able to preserve a “global” 
synchronization at every pulse. 
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5 Graphs with Known Size 

5.1 The SSP Algorithm 

We describe in our framework the algorithm by Szymanski, Shi and Prywes (the 
SSP algorithm for short) and then the synchronizer. 

We consider a distributed algorithm which terminates when all processes 
reach their local termination conditions. Each process is able to determine only 
its own termination condition. The SSP algorithm detects an instant in which 
the entire computation is achieved. 

Let G be a graph such that a boolean predicate P(v) and an integer a(v) is 
associated with each node v in G. Initially P(y) is false, the local termination 
condition is not reached, and a(v) is equal to — 1. Transformations of the value 
of a(v) are defined by the following rules. 

Each local computation acts on the integer a( vq) associated to the vertex vo', 
the new value of a(v o) depends on values associated to vertices of B(v o, 1). More 
precisely, let vo be a vertex and let {vi, ..., v c j} the set of vertices adjacent to vo- 

— If P(v o) = false then a( Vq) = — 1; 

— if P{y o) = true then a(v o) = 1 + Min{a(vk ) | 0 < k < d}. 

We consider the following assumption: 

for each node v the value of P(y) eventually becomes true and remains true 
for ever. 

We will use the following notation. Let (Gj)o<j be a relabelling chain associ- 
ated to SSP’s algorithm. We denote by a,i(v) (resp. Pi(v)) the integer (resp. the 
boolean) associated to the vertex v of G,. We have [12]: 

Proposition 2. Let (G.;)o<j< n be a relabelling chain associated to SSP’s algo- 
rithm; let v be a vertex of G, we suppose that h = ai(v) > 0. Then : 

\/w € V(G) d(v,w ) < h => ai(w) > 0. 

From this property we deduce that if a node knows the size of the graph (or a 
bound of the size, or the diameter or a bound of the diameter) then it can detect 
when P(v) is true for all vertices of the graph. 

5.2 A Synchronizer by Using the SSP Computations 

In this section we introduce a new synchronization protocol. This protocol is 
based on the SSP algorithm. Let u be a vertex, the integer p(u) denotes the value 
of the pulse associated to the vertex u. In our assumption, a vertex v satisfies 
the stable properties, if there is no node u € Bq{v, 1) such that p(u) yf p(v). 
Before giving a formal description of this algorithm, we have to explain the way 
it works. 

Let G be a graph with diameter D. To each node v of G, we associate two 
integers p(v) and a(v), where p(v) denotes the pulse and a(v) denotes the SSP 
value for the vertex v. Initially, p(v) and a(v) are respectively set to 1 and 0. 
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A vertex v can start the next pulse (p(v ) = p(v ) + 1) and its counter a(v ) is 
reset to 0 when it detects that the value of the pulse of all the vertices is equal 
to p(v). Now, we give a formal description of the above arguments. 

Consider a labelling function A, where A : V — ■> [l..oo] x [0..D]. Initially 
all vertices are labelled (1,0). The graph rewriting system is IZ\ = (Li,Ii,Pi) 
defined by L 1 = {[l..oo] x [0...D]}, I± = {(1,0)}, Pi = where R\ and 

f ?2 are the relabelling rules described below. Consider a vertex vo, the SSP 
synchronization is defined by: 

R1 : The observation rule 

Precondition : 

• X(vo) = (p(v 0 ),a(v 0 )), 

• a(v 0 ) < D , 

• Vv G B(v 0 , 1) p(v) > p{v 0 ), 

• a(v o) = Min{a{v ) | v G B(y o, 1) andp(u) = p(i>o)}. 

Relabelling : 

• A'(u 0 ) := (p(u 0 ), a(v 0 ) + 1). 

R2 : The changing phase rule 

Precondition : 

• A(u 0 ) = (p{v 0 ),D). 

Relabelling : 

• A'(u 0 ) := (p{v 0 ) + 1, 0). 

We will use the following notation. Let (Gi)o<; be a relabelling chain asso- 
ciated to the SSP synchronization. We denote by Pi(v) (resp. ai(i>)) the pulse 
(resp. the integer) associated to the vertex r of G;. 

For the correctness of this algorithm we state some invariants. 

Fact 1 p i+ i(v) > Pi(v). 

Fact 2 If Pi+ i(i>) = Pi(v) + 1 then ai(v) = D. 

Lemma 1. If pfv) = n and ai{v ) = h then: 

\/w G V (G) d(v, w) < h => pi-hiw ) > 7 r. 

Proof. We show the lemma by induction on i. If i = 0 the property is obvious. 

First we assume that Pi + \(v) = Pi(v) = tt and a,-+ i(u) = a*(w) = h. By the 
inductive hypothesis: d(v,w) < h => pi-h{w ) > tt. From Fact 1, pi-h+\{w) > 
Pi-h{w ) > TT. Thus P{i+l)-h{w) > TT. 

Now we assume that pi+i(v) = Pi(v) = tt and aj+i(v) = a,i(v) + 1 = h. If 
d(v,w) < h = ai(v) + 1 then let u such that d(v,u) = 1 and d(u,w) = ai(v) = 
h-1. 

We have Oj+i(w) = a.i(v) + 1 => Piiu) > Pi{v) > tt and ai{u) > h — 1 (by 
the precondition of the observation rule). By the inductive hypothesis applied 
to the vertex u, Pi-(h-i)( w ) > tt and finally p( i+1 )_/ l ( , u;) > tt. 

The last case is Pi + \{v) = Pi(v) + 1, necessary a*+i(u) = 0 and this achieves 
the proof. 
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From this lemma and Fact 1 , it follows: 

Corollary 1. If Pi(v) = ir and ai(v ) = h then d(v,w ) < h pi(w) > 7 r. 

Lemma 2. If Pi{v) = n and pi{w ) = 7r + 1 then \/u € V(G) (pi(u) = it or 
Pi(u) = 7T + 1). 

Proof. Let j be such that Pj(w) = n and pj + i(w) = tt + 1, by the precondition 
of the phase rule and the previous corollary: 

Vlt € V(G) Pj{u) > 7T. 

For the same reasons, as Pi(v) = it, there does not exist u such that Pi(u) > 7T+1. 



5.3 Randomized Synchronization Algorithm 

The main idea of this algorithm is based on the use of random walks in a network. 
Initially, each node (process) gets a token. At each step, a node that has a token 
passes it randomly to one of its neighbors. When more than one token meet at 
one node, they merge to one token. In a connected undirected graph, with high 
probability, all tokens will merge to one token and the node, that gets it, starts 
the next synchronization pulse. 

For our purpose, we have slightly modified the above described algorithm. 
The motivation of this departure from the main idea is well grounded since it can 
be quite laborious for a node, that has a token, to know that there is no other 
token in the whole network. In order to avoid this kind of problem, we represent 
our tokens as natural numbers and the action of merging tokens is now done 
by adding the numbers corresponding to these tokens. Moreover, a node does 
not only pass one token to one of its neighbors. Rather, it first merges all the 
tokens of its neighborhood to one and passes the resulting number to one of its 
neighbors. At the beginning of each pulse, each vertex produces a new token 
with value 1 . As soon as we have one token left, we try (if possible) to broadcast 
this information through the whole graph (rule #3). Thus, more than one node 
could start the next pulse. If a node u has a token c(u) such that c(u) = |V|, 
then u is allowed to start the next synchronization pulse. We assume that each 
node knows the size of the network. A formal description of the algorithm is 
done below. 

Let G be a graph with n vertices. Consider a labelling function A, where 
A : V — + [I..00] x [ 0 ..n]. The first item of the label of a node u represents the 
pulse of u and the second represents the minimum number of vertices that are in 
the same pulse as u. Initially, at least one vertex is labeled ( 1 , 1 ) and the others 
are ( 0 , 0 )-labeled. The graph rewriting system is IZ 2 = ( L 2 ,I 2l P 2 ) defined by 
L 2 = {[I..00] x [0..n]}, I 2 = {{( 1 , 1 )}}, P 2 = {81,82,83,84} where 81 , 8 2 , 83 
and 84 are the relabelling rules listed below. Let vq be a vertex, the randomized 
synchronization algorithm (RS algorithm in the sequel) is described as follow. 




280 



Yves Metivier et al. 



R1 : The convergence rule 

Precondition : 

• AOo) = 00o),c0o)), 

• c(v 0 ) < |V|, 

• 3v G B(v o, 1) p(v ) = p(v , 0 ) + 1. 

Relabelling : 

• A'Oo) := OO o), \V\). 

In the convergence rule , each vertex vq sets the value of its token c(v o) to 
\V\ as soon as there exists one vertex v G Bg{v o, 1) that is in a greater 
pulse than p(v o). 

R2 : The phase rule 

Precondition : 

• AOo) = (p(vo),c(v 0 )), 

• c(v o) = \V\. 

Relabelling : 

• A'Oo) '■= (p Oo) + 1, !)• 

In the phase rule, each vertex vq that has a token c(v o) such that cOo) = 
\V\ increases its pulse number and sets the value of its token to 1. 

R3 : The propagation rule 
Precondition : 

• AOo) = OOo),cOo)), 

• c Oo) < \V\, 

• e B Oo, 1) cO) = \V\ and p(v) = p(v o). 

Relabelling : 

• A'Oo) == (p(vo), |^|). 

In the propagation rule, a vertex v in the neighborhood of Vq is in the same 
pulse p(v o) and c(v) = \V\ then the token of vo is set to \V\. This rule has 
only the final aim to broadcast the token information c(v) = \V\. 

R4 : The collecting rule 

Precondition : 

• AOo) = OOo),cOo)), 

• cOo) < \v\ A cOo) > o, 

• Vu G -BOo>l) P(v) = p(v o). 

Relabelling : 

• wo := choose at random v G B{v o, 1)0 v o )> 

• Vu G B(v 0 , 1)0 ¥= «o)A'0) ; = 0(^)0), 

• 5 ; = E„eB(«o,i) c O)» 

• A'(u; 0 ) := (p(w 0 ),S). 

In the collecting rule, each vertex in the neighborhood of vq is in the same 
pulse p(vo). If vo has a token of value c Oo) < |V1, then the sum of all 
the tokens in Bq (vq, 1) is computed and the resulting token is send to one 
neighbor wq of vo- Note that the choice of w o is done randomly. The tokens 
of all the vertices v G Bq (vq, 1) such that v yf Wo are set to 0. 
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We now turn our attention to the correctness of the described algorithm. 
Therefore, we first state some useful properties that will help us to show the 
validity of our synchronization assumption. One has also to notice that a node 
that does not have a token gets a pseudo-token with value 0. Let (Gj)o<i be a 
relabelling chain associated to a run of the RS algorithm. We denote by Pi(v) 
(resp. Cj(i>)) the pulse (resp. the integer) associated to the vertex v of G;. We 
assume that initially exactly one vertex of Go is labeled (1, 1) and all the other 
vertices are labeled (0,0). As from now, we are going to pay attention to the 
validity of the properties of our algorithm. 

First we have the two following facts. 

Fact 3 V« £ V(G) Vi pi + \{v) > Pi(v). 

VugF(G) Vi Pi+i{v) = Pi{v) + 1 => Ci{v) = |V|. 

Lemma 3. Let n be a pulse, we have: 

1. If there exists vq such that 0 < Ci{v o) < |F| and Pi(vo) = it then: 

T. Ci(v ) = Card{v \ pt{ v) = 7 r}. 

{v\pi(v)=Tr} 

2. If there does not exist Vq such that 0 < Ci(v o) < \V\ and pt(y o) = 7r then: 

\/v such that Pi{v) = 7 r either Ci(v) = 0 or Ci(v) = \V\. 

Furthermore if for all v such that Pi(v) = tt we have Ci(v) = 0 then there 
exists w such that Pi(w) = 7r + 1. 

3. (3u, w such that Pi{v) = Pi(w ) + 1) =£• ( Ci{w ) = 0 or Ci{w ) = |h^|)- 
4 • if 3u, w such that Pi{v) = Pi{w) + 1 then 

y Ci(u) =Card{u\pi(u) =pi(v)}. 

{u\pi(u)=Pi(v)} 

5. If there exists a vertex v such that Ci(v) = \V\ then for all vertex u we 
have: Piiu) > Pi(v). 

6 . 

\/u,v \pi{u) ~Pi(v)\ < 1. 

Proof. The proof is by induction on i. We assume that all the properties are 
true at step i and then we examine what happens according to the rewriting 
rule which is applied. 

For the validity of our synchronization assumption we have to show that each 
node of G has enough knowledge of the whole graph to decide, if necessary, to 
change its pulse. This knowledge can only be achieved if we are able to ensure 
that after a finite time, and with high probability, there is only one token left in 
the whole network. Due to the lack of space, the main results of random walks 
and the proofs of Theorem 2, Theorem 1 and Theorem 3 can be found in [8]. 
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6 Trees 

All the synchronization protocols developed so far were dedicated to any type of 
graphs. These protocols need adequate knowledge of some graph characteristics 
to be exact and faultless. Now we introduce a new methodology devoted to trees. 
Although this methodology is restricted to trees, it has the advantage that we do 
not need to have more knowledge (size, diameter or existence of a distinguished 
node). 

The main idea of this protocol resides in the use of an election algorithm 
[ 2 ] to decide which node should start the next pulse. This algorithm solves the 
problem of the election in anonymous trees using graph relabeling systems. At 
the beginning of the synchronization protocol, all vertices are in the same pulse. 
Their labels have two items. The first one is needed for the election algorithm 
and the second one represents the pulse number. We take advantage of the 
election algorithm to choose a vertex u that starts the next pulse. Furthermore, 
all the nodes v, that have a node w in their neighborhood such that p(w) > p(v), 
increase their pulse. When a node increases its pulse, its election- label is set back 
to the initial value. A formalized description of this synchronization protocol is 
given below. 

Let T be a tree. Initially all vertices are labeled ( N , 1 ). The graph rewriting 
system is IZ4 = defined by L4 = {{L, N} x [I..00]}, I4 = {(IV, 1 )}, 

P4 = {Ri, f?2, R3} where R\, R2 and R 3 are the relabeling rules given below. 

R 1 : Leaf elimination rule 

Precondition : 

• \(v 0 ) = (label(v 0 ),p(v 0 )), 

• labeKyo) = N, 

• 3 v £ N(v 0) label(v ) = N, p(v) = p(v 0), and 
\/w € N(v o)/v p{w) = p(v 0), label(w ) = L 

Relabelling : 

• A'(u 0 ) := (L,p(v 0 ))- 

R 2 : Tree election rule 

Precondition : 

• \(v 0 ) = (label(v 0 ),p(v 0 )), 

• label(v 0) = N, 

• Vu € N(vq) p(v) = p(v 0), label(v ) = L 

Relabelling : 

• A'(u 0 ) := (N,p(v 0 ) + 1). 

R 3 : The propagation rule 

Precondition : 

• A (v 0 ) = (label (v 0 ),p(v 0 )), 

• labeKyo) = L , 

• e N{ vo) p(v) > p(v 0), label{v ) = N 

Relabelling : 

• A'(u 0 ) := (N,p(v 0 ) + 1). 




Synchronizers for Local Computations 283 



The proofs of convergence theorem and of pulse compatibility can be found in [8] . 
Note that the synchronization protocol presented in this section can be extented 
to graphs with a distinguished vertex. The idea is to compute a spanning tree, 
then to run the protocol on the computed tree (all the details can be found in 
[ 8 ])- 



7 Building Synchronizers 

The main goal of this section is to give a methodology, that should transform 
the protocols introduced in previous sections in operative synchronizers. All the 
developed protocols assume the existence of a pulse generator at each node of 
the network. This means, that a node v has a pulse variable p(v), and it is 
supposed to generate a sequence of local clock pulses by increasing the value 
of p(v) from time to time (i.e. p(v) = 0,1,2,...). These pulses are supposed 
to simulate the ticks of the global clock in the synchronous setting. Obviously, 
the use of these protocols as stand-alone applications will give nothing in an 
asynchronous environment. For this reason, we introduce some definitions that 
should help us to ensure some guarantee about the relationship between the 
pulse values at neighboring nodes at various moments during the execution. 

Definition 1. We denote by t(v,p) the physical time in which v has increased 
its pulse to p. We say that v is at pulse p(v) = p (or at pulse p) during the time 
interval r(v,p) =]t(v,p),t(v,p + 1)]. 

Since our network is fully asynchronous, we are not able to force all the vertices 
to maintain the same pulse at all times. However, we know that it is possible 
to guarantee a weaker form of compatibility between the pulses of neighboring 
nodes in the network. This form of compatibility is stated in Definition 2. 

Definition 2 (Pulse Compatibility). If node v sends an original message m 
to a neighbor w during its pulse p[v) = p, then in is received at w during its 
pulse p(w ) = p as well. 

7.1 Methodology to Construct a Synchronizer 

To build a functioning synchronizer from each of our protocols, we have to change 
their specifications such that, given an algorithm 11 $ written for a synchronous 
network and a protocol u, it should be possible to combine IIs on top of v to 
yield a protocol 77 a = v(II$) that can be executed on an asynchronous net- 
work. IIa has two components: The original component and the synchronization 
component. Each of these components has its own local variables and messages 
at every vertex. The original component consists of the local variables and the 
messages of the original protocol 11 $, whereas the synchronization component 
consists of local synchronization variables and synchronization messages. 

As from now, we are going to show that the changes needed to build a syn- 
chronizer from any of our protocols can be done effortless. Further, we will show 
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that it is possible to take advantage of these changes to prove the correctness 
of the so constructed synchronizers. This proof will be done with respect to the 
definition given by David Peleg [9]. 

Conceptually, the modifications of our protocols are done in two steps. The 
first step affects the label attached to each vertex. In fact, we add three new 
items to each label. The first item consists of a boolean variable S. This variable 
decides which component {original component or synchronization component ) is 
active on each vertex. Only the rules of the active component can be applied 
on a vertex v. The other items are two buffers B p and B p _ \ that represent 
the messages that a node v, at pulse p(v), has to send respectively in pulses 
p{v) and p(v) — 1. We claim that a protocol p that is generated from the above 
modifications satisfies all the specifications of a synchronizer. In a general way, 
a protocol 77 a — n $ ) will work in two phases. 

Initially, the item S v is set to true for all nodes v and therefore the synchro- 
nization component is active(Plrase one). As soon as a node v increases its pulse 
p(v), its variable S v is set to false. Hence, only rules from Tig can be applied 
on v. The original component is active(plrase two). Each application of a rule 
of 77g sets back the state of S v to true. Thus the synchronization component 
can begin to synchronize v with all the other nodes. This cycle is done until 77g 
stops IIa with any specific request. 

7.2 A General Overview 

Both phases depicted in the previous section are summarized in Fig. 1. The rule 
Ri represents phase one and Ri represents phase two. A describes the set of 
rules used in the synchronization protocol(V) and Q is the set of relabeling rules 
used in the algorithm Tig. The first rule means that as long as S v = True holds, 
it is only allowed to synchronize v with all the nodes in the network. As soon as 
v changes its pulse, the second phase is started for v. In this phase, it is no more 
allowed to synchronize but it is permitted to execute rules of the algorithm 77g. 
After the execution of TJg, rule one is active once again, and the cycle goes on. 
All our synchronizers will have to use a rule that plays the role of 7?2- This rule 
should ensure a good interaction between the two protocols v and 77g. The rule 
i ?2 is executed only if the subgraph pictured between brackets in Fig. 1 does not 
exist in the neighborhood of v. 



Correctness of Our Synchronizers. We now introduce some meaningful 
properties that can all be guaranteed from the synchronizers we can build. 

Definition 3 (Readiness property). A node v is ready for pulse p, denoted 
Ready(v,p), once it has already received all messages of the algorithm sent to it 
by its neighbors during their pulse number p — 1. 

Definition 4 (Readiness rule). A node v is allowed to generate its pulse p 
once it is finished with its required original actions for pulse p— 1 and Ready(v,p) 
holds. 
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i, A, Q, Bi, Bi_i, T 



: • - 

i, A, Q, Bi, Bj_i, F 

n 2 : • - 



* 




* 




t- 1, A n,B i+1 ,Bi,F 

• • 

i, A, Q, Bi, Bi-i,T 

► • 



i, A, 12, Bi, Bi-i, F 

{ T,„ 1 

j, A, 42', Bj , Bj— i 9 x 



Fig. 1. A general method to construct a synchronizer. 



Lemma 4. Let p be a synchronizer built as described in section 7.1. fi always 
satisfies the Readiness rule. 

Proof. The proof is an immediate consequence of the use of rule i? 2 (see Figure 
1). A node v is allowed to generate its pulse p, if and only if S v = T and v satisfies 
the conditions required from the synchronization protocol. The only possibility 
for S v to become True is the execution of rule i? 2 - Thus, v has executed its 
required Tig-actions for pulse p — 1 and Ready(v,p) holds. 

Definition 5 (Delay rule). If a node v receives in pulse p a message sent 
to it from a neighbor w during some later pulse p' > p of w, then v declines 
consuming it and temporarily stores it in a buffer. It is allowed to process it only 
once it has already generated its pulse p' . 

Lemma 5. Let p be a synchronizer built as described in section 7.1. p always 
satisfies the Delay rule. 

Proof. Let v be a node that has received in pulse p a message m ' sent to it from 
a neighbor w during pulse p' > p. All our synchronization protocols guaranteed 
that after a finite number of steps, v and w will be in the same pulse p' . On 
the other hand, v can only consume the messages contained in the buffer B™ . 
Such a buffer always exists. Indeed, the pulse difference between two nodes in 
the whole network is maximal 1. This means that v will be able to consume m! 
as soon as p(v) = p' holds. 

Lemma 6. [9] A synchronizer imposing the readiness and delay rules guarantees 
pulse compatibility. 

The above lemma states easily the reasons why all our synchronizers guarantee 
the principle of pulse compatibility introduced in definition 2. Peleg showed 
in [9] an essential relationship between the concept of pulse compatibility and 
the correctness of a synchronizer. One of the interesting parts of his work was 
announced as the lemma below. 

Lemma 7. [9] If synchronizer p guarantees pulse compatibility, then it is cor- 
rect. 
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Lemma 8. Let p, be a synchronizer built as described in section 7.1. p is correct. 

Proof. The proof is deduced from the lemmas 4, 5 and according to the lemmas 

6 and 7. 
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Abstract. Graph constraints and application conditions are most im- 
portant for graph grammars and transformation systems in a large va- 
riety of application areas. Although different approaches have been pre- 
sented in the literature already there is no adequate theory up to now 
which can be applied to different kinds of graphs and high-level struc- 
tures. In this paper, we introduce an improved notion of graph constraints 
and application conditions and show under what conditions the basic 
results can be extended from graph transformation to high-level replace- 
ment systems. In fact, we use the new framework of adhesive HLR cate- 
gories recently introduced as combination of HLR systems and adhesive 
categories. Our main results are the transformation of graph constraints 
into right application conditions and the transformation from right to 
left application conditions in this new framework. 



1 Introduction 

According to the requirements of several application areas the rules of a graph 
grammar have been equipped in [4] by a very general notion of application 
conditions. In a subsequent paper [8], the notion of application conditions is 
restricted to contextual conditions like the existence or non-existence of certain 
nodes and edges or certain subgraphs in the given graph. In [9], the authors 
introduce graphical consistency constraints, also called graph constraints, that 
express very basic conditions on graphs as e.g. the existence or uniqueness of 
certain nodes and edges in a graphical way. 

Basic results for graph constraints and application conditions have been 
shown in [9, 10] first for the single and later in the double pushout approach 
for different kinds of graphs. Unfortunately there is no adequate theory up to 
now which can be applied not only to graphs but also to high-level structures in 
the sense of [5]. 

A new version of high-level replacement systems, called adhesive HLR sys- 
tems, has been introduced in [6] combining HLR systems in the sense of [5] and 
adhesive categories (see [11]). This new framework has been used not only to re- 
formulate the basic results like local Church Rosser, Parallelism and Concurrency 
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Theorem from [5], but also to present an improved version of the Embedding 
Theorem [3] and the local Confluence Theorem, known as Critical Pair Lemma 
[12]. Moreover it can be applied to all kinds of graphs and Petri nets satisfying 
the HLR1 and HLR2 conditions in [5] and also to typed attributed graphs in [7]. 

In this paper we use adhesive HLR categories and systems to improve and 
generalize the basic notions and results for constraints and application conditions 
from graphs to high-level structures. For this purpose we present an improved 
notion of graph constraints, based on positive and negative atomic constraints, 
and of application conditions, based on atomic conditional conditions. In our 
main theorems we show how to transform constraints into right application con- 
ditions, and right into left application conditions in the framework of adhesive 
HLR systems. As additional condition we only need finite coproducts and a 
suitable .E-M-factorization which is valid in all our example categories. 

The paper is organized as follows. In section 2 we present our improved no- 
tions of graph constraints and application conditions. In section 3 we give a 
short introduction of adhesive HLR categories together with some basic prop- 
erties. Then we generalize graph constraints and application conditions to the 
framework of adhesive HLR categories. In section 4, we present the main results 
for graphs and high-level structures and give several illustrating examples for 
graphs and place transition nets. A conclusion including further work is given in 
section 5. 



2 Constraints and Application Conditions for Graphs 

In the following, we assume that the reader is familiar with the notions of graphs 
and graph morphisms, see e.g. [3,2]. Graph constraints, first investigated by [9], 
allow to express basic conditions on graphs as e.g. the existence or uniqueness 
of certain nodes and edges in a graphical way. 

Definition 1 (graph constraint). An atomic graph constraint is of the form 
PC(a) or NC(a) where a: P — » C is an arbitrary graph morphism. It is said to be 
a positive or negative atomic graph constraint, respectively. A graph constraint 
is a Boolean formula over atomic graph constraints, i.e. every atomic graph 
constraint is a graph constraint and, for every graph constraint c, ->c is a graph 
constraint and, for every index set I and every family (ci)j e / of graph constraints, 
Ajg/Cj and Wi^iCi are graph constraints. A graph G satisfies PC(a) (NC(a ) ), 
written G (= PC(a) (NC(a)), if for every injective morphism p: P — > G there 
exists (does not exist) an injective morphism q:C —> G such that q o a = p. 




Fig. 1. Satisfiability of atomic constraints. 
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A graph G satisfies a graph constraint -i c, written G \= -> c, if and only if 
G does not satisfy the graph constraint c. G satisfies A ig/c, (VieiCi), written 
G |= A ig/Cj (Vig/Cj/, if and only if G satisfies all (some) graph constraints Ci 
with i £ I. 

Example 1 . Examples of graph constraints: 

PC(QO — ► O) There exists at most one node. 



AfeLi NC(0 — > Cfc) The graph is acyclic (Cfc denotes a cycle of length k). 

Remark. The definition of graph constraints generalizes the one in [9] , because 
we allow negative atomic constraints and non-injective a. 

Fact. If a is non-injective and G |= PC(a), then there is no injective p: P — > G. 

Proof. Assume there is an injective p: P — > G. Then G |= PC(a) implies the 
existence of an injective q: C — > G with q o a = p. This implies a injective 
(contradiction) . 

Fact. If a is non-injective, then G \= NC(a). 

Proof. Assume G NC(a). Then there exist injective p: P — > G and q:C — > G 
with q o a = p. The injectivity of p implies the injectivity of a (contradiction). 

Application conditions for graph replacement rules were first introduced in 
[4] . In a subsequent paper [8] , a special kind of application conditions were con- 
sidered which can be represented in a graphical way. In particular, contextual 
conditions like the existence or non-existence of certain nodes and edges or cer- 
tain subgraphs in the given graph can be expressed. In [9] so-called conditional 
application conditions were introduced. 

Definition 2 (application condition over a graph). An (conditional) atom- 
ic application condition over a graph L is of the formV(x,\/i^iXi) or N(x, A^iXi) 
where x: L — > X is an arbitrary graph morphism and xp X — > Ci with i £ I injec- 
tive graph morphisms. It is said to be a positive or negative atomic application 
condition, respectively. An application condition over L is a Boolean formula 
over atomic application conditions over L, i.e. every atomic application condi- 
tion is an application condition and, for every application condition acc, -t acc 
is an application condition and, for every index set I and every family ( acc i)igj 
of application conditions, A j g j acc^ and \Zi^i acc $ are application conditions. 




Every node has a loop. 
The graph is loop- free. 



There exists a node without loop. 
There exists a node with a loop. 
There exists no node with a loop. 
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X Xi X Xi 

L A' Ci L X Ci 




Fig. 2. Satisfiability of atomic application conditions. 



A match m: L — > G satisfies acc^ = P(x, Vjg/a;,) (N(x,A i^iXi)), written 
m \= acc l, if for all injective morphisms p: X — > G with p o x = m there exists 
(does not exist) i £ I and an injective morphism qp. Ci — > G with qio Xi= p. 

A match m: L — > G satisfies an application condition of the form -> acc, writ- 
ten m |= -i acc, if and only if m does not satisfy the application condition acc. 
A match m satisfies Ajgj acc , (Vj £ j acc i), written m |= A acc; (V»gj acc i), if 
and only if m satisfies all (some) acc^ with i G I. 

Remark. The definition of an application condition slightly generalizes the ones 
in [8,9]. Let us consider the well-known negative application condition NAC(x), 
where x: L — > X is a graph morphism. A match m: L — > G satisfies NAC(x), 
written m |= NAC(a;), if there does not exist an injective morphism p: X G 
with p o x = m. NAC(x) is equivalent to P(x, Vjg/Xj) for 1 = 0 and hence a 
special case of positive atomic application conditions. 

Example 2. Examples of application conditions and their meaning for an injec- 
tive match m: L — > G: 

NAC(0 O — > 0“*~0) There is no edge from node m(l) to node m( 2). 

AfcLi NAC(0 O — > Pfc) There is no path connecting node m(l) and m(2). 

(Pfc denotes a path of length k) 

P(0 O — * OO — > CCO) If there is an edge from m(l) to m( 2), 

then there also is an edge from m( 2) to m(l). 

A rule p = (L <— K — > R) consists of two injective graph morphisms with 
a common domain K. Given a rule p and a graph morphism K —* D, a direct 
derivation consists of two pushouts (1) and (2). We write G H and say 

that to: L — > G is the match and to*: R — > H is the comatch of p in H. 



L — 



(1) 



G - 



K 



D 



-R 



(2) 



TO 



-AH 



Definition 3 (application condition for a rule). An application condition 
A(p) = (Al,Ar) for a rule p = (L <— K — > R) consists of a left application 
condition Al over L and a right application condition An over R. A direct 
derivation G H satisfies an application condition A(p) = (Al,Ah), if 



m |= Al and m* 1= a r . 
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3 Constraints and Application Conditions 
for High-Level Structures 



The main idea of high-level replacement systems is to generalize the concepts 
of graph replacement from graphs to all kinds of structures which are of inter- 
est in Computer Science and Mathematics. In the following, we will consider 
constraints and application conditions in adhesive HLR-categories (see [6]) and 
prove our transformation results on this general level. 



Definition 4 (adhesive HLR-category). A category C with a morphism 
class M is called adhesive HLR category, if 1) M is a class of monomorphisms 
closed under compositions and decompositions (g o f £ M, g £ M implies 
f £ M), 2) C has pushouts and pullbacks along M-morphisms, i.e. pushouts 
and pullbacks, where at least one of the given morphisms is in M, and M- 
morphisms are closed under pushouts and pullbacks, and 3) pushouts in C along 
M-morphisms are VK-squares, i.e. for any commutative cube in C where we 
have the pushout with m £ M in the bottom and the back faces are pullbacks, it 
holds: the top is pushout <t=> the front faces are pullbacks. 



,C' 

i 

g 



f 



B' 



V 



m 

" / 

A — 



f 

C 



A' b 

d r 

— B 

K, 



D 



Example 3. All examples of adhesive categories defined in [11] are adhesive HLR 
categories for the class M of all monomorphisms. As shown in [11] this includes 
the categories Sets of sets, Graphs of graphs and several variants of graphs like 
typed, labelled and hypergraphs. Moreover this includes the category PT-Net 
of place transition nets considered in [5] already. The following categories are 
important examples of adhesive HLR categories where M is not the class of all 
monomorphisms: the category (AGraphsATG, M) of typed attributed graphs 
with type graph ATG and class M of all injective morphisms with isomorphism 
on the data part is (see [7]), the category (AHL-Net,M) of algebraic high 
level nets with class M of all strict injective net morphisms, and the category 
(Spec, M) of algebraic specifications with class M of all strict injective specifi- 
cation morphisms [5] . 



Fact (HLR properties of adhesive HLR categories). Given an adhesive 
HLR-category (C,M), the following HLR conditions are satisfied. 

1. Pushouts along M-morphisms are pullbacks. 

2. Pushout-pullback decomposition: If the diagram (l)+(2) is a pushout, (2) a 
pullback, and l,w £ M, then (1) and (2) are pushouts and also pullbacks. 
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u w 



3. Uniqueness of pushout complements for M-morphisms: Given b: A — > B in 
M and s: B — > D then there is up to isomorphism at most one C with 
l: A — > C and u:C — > D such that diagram (1) is a pushout. 

Proof. See [6,11]. 

General assumption. In the following, we assume that (C ,M) is an adhesive 
HLR category with binary coproducts and epi-M-factorizations, that is, for every 
morphism there is an epi-mono-factorization with monomorphism in M . 

We will consider structural constraints and application conditions in our 
general framework. Structural constraints, short constraints, correspond to graph 
constraints in section 2 , but not necessarily to logical constraints defined by 
predicate logic. 

Definition 5 (constraints). An atomic constraint is of the form PC(a) or 
NC(a) where a: P — > C is an arbitrary morphism. PC(a) is said to be positive 
and NC(a) negative. A constraint is a Boolean formula over atomic constraints. 
An object G satisfies PC(a) (NC(a)), written G \= PC(a) (NC(a)), if for every 
morphism p: P — > G in M there exists ( does not exist) a morphism q:C — > G 
in M such that q o a = p (see figure 1). Satisfiability of arbitrary constraints is 
defined in the usual way (see definition 1). 

Definition 6 (application condition over an object). An atomic applica- 
tion condition over an object L is of the form P(x, or N(x, A jXj) where 

x: L — > X is an arbitrary morphism and xp. X — > C, with i £ I are morphisms 
in M . It is said to be a positive or negative atomic application condition, re- 
spectively. An application condition over L is a Boolean formula over atomic 
application conditions over L. A match m: L — >■ G satisfies acc^ = P (a;, Vjg/Xj) 
(N(x, A i£ix{)), written m |= acc l, if for all morphisms p: X — > G in M with 
p o x = m there exists (does not exist) i £ / and a morphism qp. Ci — > G in M 
with qi o Xi = p (see figure 2). Satisfiability of arbitrary application conditions is 
defined in the usual way (see definition 2). 

The special case of negative atomic application conditions NAC(x) and gen- 
eral application conditions for rules (see definition 3) are defined as in the graph 
case. 

General Remark. In the case I = 0 we have P(x, Vjg/Xi) = NAC(x), where 
m |= NAC(x) means that there is no p € M with p o x = m. Moreover we have 
N(x, A ig/Xj) = true for I = 0, because m* |= N(x, A ie 0 Xj) <t=t Vp(p € M Apox = 
m* =A ->(3i e I = 0...)) <t=> Mp true <t=> true. 
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4 Main Results for Graphs and High-Level Structures 



In the following, we will show that arbitrary constraints can be transformed into 
right application conditions and that right application conditions can be trans- 
formed in left application conditions. We first show that positive and negative 
atomic constraints can be transformed into right application conditions. 

Lemma 1 (transformation of positive atomic constraints into right ap- 
plication conditions). Given a positive atomic constraint PC(a) with a: P — > C 
and comatch m*:R — > H. Then there is a right application condition T(PC(a)) 
such that m* \= T(PC(a)) O- H \= PC(a). 

Construction. Let T(PC(a)) be the right application condition 

r(pc(a)) = AsP(iiAs, v i6 j(5^ro) 

1. The conjunction A s ranges over all “gluings” S of R and P in figure 3(a). 
More precisely over all triples ( S , s,p) with arbitrary s: R — > S and p: P — > S 
in M such that the pair (s,p) is jointly epimorphic. For each such triple 
(S,s,p) we construct the pushout (1) of p and a leading to t: S — > T and 
q:C T. 

2. The disjunction Vj e j ranges over all S tj ->Ti with epimorphism t,j such that 
tiot and fj o q are in M. For I = 0 we have T(PC(a)) = AgNAC(i?^ S). 



s 



R 

T 

S- 



P 



P 



t 



(1) 



a 




(a) 




Fig. 3. Construction of T(PC(a))/Correspondence of T(PC(a)) and PC(a). 



Proof. See appendix A. 

Example 4- Consider the positive atomic graph constraint PC(0 — > Q) (see 
example 1) and the rule p = (OO <— OO — > CK)). According to the con- 
struction in lemma 1 the graph constraint can be transformed into the following 
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conjunction of right positiv atomic application conditions Af =1 P(0— O — > Si, 
Si — > Ti) AP(CK) — > S 5 ,WUS 5 ~ T§j) with Si,Ti,T$j as shown below. The 
condition expresses the positiv atomic application condition “Every node out- 
side (see Ti,T 4 ) and inside (see T 2 , T 3 , T 5 i, T 52 ) the comatch must have a loop.”, 
where Si,S 2 ,S 3 correspond to injective and S4, S5 to non-injective comatches. 
Altogether this condition means that for each comatch m* : R — > H each node 



of H must have a loop, which is 
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Lemma 2 (transformation of negative atomic constraints into appli- 
cation conditions). Given a negative constraint NC(a) with a: P — > C and 
comatch m*:R — > H. Then there is an application condition T(NC(a)) such 
that m* |= T(NC(a)) O H |= NC(a). 

Construction. Let T(NC(a)) be the following right application condition 
T(NC(a)) = A s N(fi4s,A, e ;(S^r,)), 

where the morphisms s: R — > S and ti o t: S —> Tj are the same as in the con- 
struction of lemma 1 and to* |= T(NC(a)) is now defined by: For all (S, s,p) as 
given in the construction, all p": S — *• H in M with p" o s = to* there is no i £ I 
and q Ti — > H in M with q" o ti o t = p" . For / = 0 we have, according to the 
general remark after definition 6, m* |= T(NC(a)) <t=> true. 

Proof. See appendix A. 

Example 5. According to example 3 we now consider place transition nets. Con- 
sider the negative atomic net constraint NC(0 — > 0-»Q). H satisfies this con- 
straint if H contains no subnet of the form O— □ , where we call such a place a 

“sink place”. Consider the rule p = <— O O — > O—CI— O)- Accord- 

ing to the construction in lemma 2 the constraint can be transformed into the 
application condition N (R — > Si, Af =1 5i — > Xj,;) A N (R — > S 2 ,/\? =1 S 2 — > T 2i ) 

where R, S\, S 2 ,Tij are given below. The condition means “No sink place is al- 
lowed to be outside or inside the comatch, e.g. no sink place is allowed in H .” , 
where Si takes care of injective and S 2 of non-injective comatches to*: R — > H . 
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R O— □— O R OCKI 

I I 

Si OCH) : h 0 P s 2 CO®- 0 P 

II II 

rno-D-oo-a- chuc r 21 ncocwc- o*nc 




Tu T la CKOCI— O Tv, O-tHK] T21 T 22 COO— □ 

The transformation in lemma 1 and 2 can be extended to arbitrary con- 
straints. 

Theorem 1 (transformation of constraints into application conditions). 

Given a constraint c and a comatch m*:R — > H. Then there is an application 
condition T(c ) such that m* \= T(c) •o- H |= c. 

Proof. For atomic constraints, the transformation is given in the proof of lemma 
1 and 2, respectively. For arbitrary constraints, the transformation is inductively 
defined as follows: T(->c) = ->T(c), T(Aj e /Cj) = A j g /T(cj) and T(y ie iCi) = 
Vjg/T(ci). Now the proof of the statement is straightforward. 

In the following, we will show that arbitrary right application conditions can 
be transformed into left application conditions. For this purpose, we first show 
that right positive and then right negative atomic application conditions can be 
transformed into corresponding left atomic application conditions. 

Lemma 3 (transformation from right positive atomic to left positive 
application conditions). Given a rule p = (L <— K — > R) and a right positive 
atomic application condition acc# then there is a left positive atomic application 
condition acc # such that for all direct derivations G =$- Pt m,m* H we have: 

m 1= acc# <t=> m* |= acc#. 

Construction. Let acc# = P(i?-^ X, Vjg/(X Cfj) be a right positive atomic 
application condition in figure 4. Then we construct a left positive atomic appli- 
cation condition acc/, = p -1 ( acc#) = P (L-^*Y,\/iei>(Y Df)) with I' C I or 

p -1 ( acc#) = true as follows: 

1. If the pair (r: K — > R, x: R — > X ) has a pushout complement, define y: L — > Y 
by two pushouts (1) and (2), otherwise p -1 ( acc#) = true. 

2. For each i £ I, if the pair (r*: Z — > X, xp. X — > Cf) has a pushout comple- 
ment, then i £ V and y, : Y — > Di is defined by two pushouts (3) and (4), 
otherwise i ef I'. Since pushout complements of M-morphisms (if they exist) 
are unique, the construction yields a unique result up to isomorphism. 



Proof. See appendix A. 
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Fig. 4. Transformation of application conditions. 



Example 6. Consider the rule p = (CK) O O — > O O ) and accord- 
ing to the remark after definition 2 - the right application condition blccr = 
NAC(0 O — > O-O) (see example 2), meaning that an edge between the nodes 
in the comatch must not exist. According to the construction of lemma 3 with 
7 = 0 the right atomic application condition is transformed into the left atomic 
application condition accz, = N AC (0—0 — > CCO) with 7 = 0 meaning that two 
parallel edges between the nodes in the match must not exist. 

L K R 

oo— ool— oo 



CCO — o-q— oo 

Y Z X 

Remark. 1. Dually we can construct from acc l a right atomic application con- 
dition ace# = p( acc l) such that m |= acc^ m* |= p( acc l). 

2. For 7 = 0, acc# is a negative atomic application condition, i.e. acc/j <f=> 
NAC(a;). In this case p -1 ( acc#) is either true or also a negative atomic appli- 
cation condition, i.e. p _1 ( acc #) NAC(y). For 7 ^ 0, acc# is a “real” atomic 
application condition and p~ x ( acc#) may be either true, a negative atomic ap- 
plication condition (if I' = 0) or also a “real” atomic application condition. 

3. Since Xi (i £ 7) and also r and r* are in M, also Zi and yi are in M in (3) 
and (4) respectively (M-morphisms are closed under pushouts and pullbacks). 

Lemma 4 (transformation from right negative atomic to left negative 
application conditions). Given a rule p = (L <— K — > R) and a right negative 
atomic application condition acc# then there is a left negative atomic application 
condition acc# such that for all direct derivations G 77 we have: m \= 

acc l <t=> m* \= acc#. 

Construction. Let acc# = N(7? —> X, Ajgjpf Ci)) be a right negative 
atomic application condition. Then we construct a left negative atomic applica- 
tion condition acc# = p _1 ( acc#) as follows: 

1. If 7 = 0 then acc# = true and we define acc# = true. 

2. If 7 ^ 0 and (r, x) has no pushout complement then acc# = true. 
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3. If 1^0 and (r,x) has a pushout complement then define y: L — > Y by two 
pushouts (1) and (2) in figure 4. Moreover for each * £ I if ( r*,Xi } has a 
pushout complement then i £ I' and y % : Y — » D, is defined by pushouts (3) 
and (4) in figure 4, otherwise i ^ I' . Now define 



acc L = p~\ acc R ) = N (L ^ Y, /\ ieI , (Y ^ A)) 
where accz, = true in the case /' = 0. 

Proof. See appendix A. 

Example 7. Consider the same rule p as in example 5 and the following right 
negative atomic application condition acc R = N(R — > X, X ► C) which corre- 
sponds to N(i? — > S' 2 ■ S ‘2 — > T 22 ) in example 5 ■ H \= accR means that for each 
non-injective comatch m* : R —* H the place in m*{R) must not be a sink place. 
According to the general construction in figure 4 we obtain the following left 
negative atomic application condition acc_L = N(L — > Y, Y — > D). G |= accz, 
means that for each non-injective match m : L — >■ G the place in m{L) must not 
be a sink place. Note that a non-injective match m : L — > G can only identify 
the places, because otherwise the gluing condition would be violated. 

L OCqTso O O 

y ccccn o -coo X 

d ncocn 0-0 -noo-mc 

The transformation in lemma 3 and 4 can be extended to arbitrary right 
application conditions. 

Theorem 2 (transformation from right to left application conditions). 

Given a rule p = (L <— K — > R) and right application condition acc/{ then 
there is a left application condition accz, such that for all direct derivations 
G => Py m,m* H we have: m \= acc l •<=> m* |= acc/j. 

Proof. For right atomic application conditions, the transformation is defined as 
in the proof of lemma 3 and 4, respectively. For arbitrary right application con- 
ditions, the transformation is defined as follows: p _1 (^ acc#) = acc#), 

p _1 (A ie / acc iR ) = A ie ip~ 1 ( acc iR ), and p -1 (V ie/ acc iR ) = V ie /p _1 ( acc lR ). 
Now the proof of the statement is straightforward. 

5 Conclusion 

In the present paper we have introduced a general notion of constraints and 
application conditions that is more expressive than previous ones in the graph 
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case and is formulated now for high-level structures in the new framework of 
adhesive HLR categories (see [6]). It is shown that constraints can be transformed 
into right application conditions for rules and that right application conditions 
can be transformed into left ones. As a consequence, we have a mechanism 
to integrate constraints into rules and to ensure that the constraints remain 
satisfied. 

Further topics could be the followings. 

(1) Extension of the notions of constraints and application conditions: Al- 
though the constraints are more general than the ones in [9], there are constraints 
which cannot be expressed up to now. E.g. the constraints like “Every node has 
an outgoing or incoming edge” and “There exists a node such that all outgoing 
edges are labelled by a” could be expressed if one would extend the concepts by 
alternative or conditional atomic graph constraints and existential satisfaction of 
graph constraints. Also the extension of constraints from statical (propositional 
logic) to dynamical (temporal logic) constraints [10] and a transformation be- 
tween logical and graphical constraints (e.g. OCL-Constraints [1]) is interesting. 

(2) Extensions of the theory: In [8] , the local Church Rosser theorems I and II 
are proved for single-puslrout rules with negative atomic application conditions. 
These results are also valid for the double-puslrout rules with arbitrary applica- 
tion conditions in high-level structures provided that the notion of independence 
is extended such that the induced matches satisfy the corresponding application 
conditions. Moreover, it would be important to generalize the results in [6] to 
rules with application conditions. 

(3) Applications of the theory: The theory can be applied already not only 
to graph transformations over labelled graphs (see [3,2]) but also to several 
variants of graphs like typed attributed graphs and lrypergraphs and also to Petri 
nets (see [5,6]) and examples 4-7. For building up Petri nets satisfying some 
net constraints, one could integrate the constraints as application conditions 
into the rules. Another important application of adhesive HLR categories and 
corresponding systems is typed attributed graph transformation as presented in 
[7], where also a slight extension of the general assumption in section 3 seems to 
be useful. 
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A Appendix 

In this appendix we present the proofs of all lemmata given in section 4. 

Proof of Lemma 1. 1. Let m* \= T(PC(a)). We have to show H \= PC(a), i.e. 
for all morphisms p': P — > H in M there is a morphism q’: C — •> H in M with 
q' o a = p' . Given a morphism p': P — > H in M and a comatch m*: R — > H , we 
construct the coproduct R + P with injections inp and inp in figure 3(b). By the 
universal property of coproducts, there is a unique morphism f: R+P — » H with 
/ o inp = in* and / o inp = p' . Now let f = p" o e be an epi-mono factorization 
of / with epimorphism e and monomorplrism p" in M, and define s = e o inp 
and p = e o inp. Then the pair (s,p) is jointly epimorplric, because e is an 
epimorphism, and p is in M , because p" o p = p" o e o inp = / o inp = p' is in M 
(Af -morphisms are closed under decomposition). Hence ( S,s,p ) belongs to the 
conjunction A s of T(PC(a)). Moreover we have p" os = p"oeoinp = /oinp = m* 
with monomorphism p" in Af. 

In the case I ^ 0, m* |= T(PC(a)) implies existence of * € I and q": Ti — > H 
in Af with q" o ti o t = p" . Now let q' = q" o t i o q then q' is in M , because 
q" is in M by construction and ti o q is in M by step 2 in the construction 
(Af -morphisms are closed under decompositions). Finally we have H |= PC(a), 
because q' o a = q" o ti o q o a = q” o ti o t o p = p" o p = p' . 
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In the case 7 = 0, the existence of p" £ M with p" o s = m* contradicts 
m* |= T(PC(a)) = AsNAC(s). Hence our assumption to have a p':P — * 77 in 
M is false, which implies 77 f= PC (a). 

2. Let 77 \= PC(a). We have to show m* |= T(PC(a)), i.e. for all triples 
(. S,s,p ) constructed in step 1 and all monomorphisms p"\S — > 77 in M with 
p" o s = in* we have to find i £ I and a morphism q" : Tj — > 77 in M with 
q" otiot = p" . Given (S, s,p ) and p" in M as above we define p' = p" op: P — > 77. 
Then p' is in M, because p and p" are in M, and H |= PC(a) implies q'\C — > H 
in M with q' o a = p' . Hence p" o p = p' = q' o a. The universal property of 
puslrouts implies the existence of a unique morphism h:T — > H with h o t = p" 
and h o q = q' . Now let h = q" o e! be a epi-mono factorization of h with 
epimorphism e' and monomorphism q" in M. Then q" o e! o t = hot = p" in M 
implies e! o t is in M and q" o e! o q = h o q = q' in M implies e' o q in M ( M closed 
under decompositions). Hence according to construction step 2 e' o t belongs to 
the family of T(PC(a)) such that e' = tf.T — > Ti for some i £ I. 

In the case / / 0 we have q" in M and q" o ti o t = q" o e' o t = hot, = p" 
implies m* |= T(PC(a)). In the case I = 0 we have a contradiction which means 
that our assumption to have p" £ M with p" o s = m* is false. This implies 
m* (= AsNAC(s) = T(PC(a)). 

Remark. The proof of lemma 1 in both directions does not require that a: P — > C 
is in M. If a is not in M, however, then t is not in M. (In fact, p in M in 
pushout (1) implies (1) pushout and pullback such that t in M would imply a 
in M). Hence there is no t t in M s.t. ti o t is in M. This implies I = 0, s.t. 
T(PC(a)) = AsNAC(s). Hence we have for a not in M or T = 0 the equivalence 

m* \= AsNAC(s) PC(a). 

Proof of Lemma 2. 1. Let m* \= T(NC(a)) and 7^0. The claim H \/= NC(o) 
implies the existence of morphisms p' : P — > 77 and q'\ C — » H in M with q'oa = p' 
and will lead to a contradiction. Given p':P — > 77 in M as above and m* we 
construct the coproduct R + P. This leads to a unique f:R + P — > 77 with 
/ o infl = m* and / o in p = p' . Now let / = p" o e an epi-mono-factorization of 
f with epimorphism e and monomorphism p" in M and define s = e o in# and 
p = eoinp. Similar to part 1 of the proof of lemma 1 we have that (S, s, p) belongs 
to the family A# of T(NC(a)) and we have p" o p = p" o e o inp = f o inp = p'. 
Moreover we have p" o s = p" o e o in# = / o in# = m* with monomorphism 
p" in M and q' o a = p' . Now pushout (1) in the figure 3, implies existence 
of h: T — > 77 with h o t = p" and h o q = q' . Now let h = q" o e! epi-mono- 
factorization of h with epimorphism e! and monomorphism q" in M . Then, by 
the decomposition property of A7, p" = hot = q" o e! o t in Al implies e' o t in M 
and q' = h o q = q" o e! o q implies e' o q in M. Hence e! o t belongs to the family 
ti ot in the construction of T(NC(a)). Hence there is i £ I with e! = tp.T — > Ti 
and q" in M with q" o ti o t = q" o e' o t = h o t = p" . 

Our assumption m* |= T(NC(a)) implies that for all ( S,s,p ) as in the con- 
struction and all p": S — > 77 in M with p" o s = m* as given above there is no 
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i £ I and q": Ti — > 77 in M with q” o ti o t = p" . This is a contradiction to 
existence of i £ I and q" constructed above. 

2. Let 77 f= NC(a) and 7^0. The claim to* ^ T(NC(a)) will lead to 
a contradiction. For 7 ^ 0, to* T(NC(a)) implies the existence of ( S,s,p ) 
as in the construction of T(NC(a)), and existence of p":S — > 77 in M with 
p" os = to*, existence of i £ I with Uot, Uoq in M, and existence of q":Ti — * 77 
in M with q" o tj : o t = p" . Now let p' = p" op , then p,p” in M implies p' in M. 
Further let q' = q" o ti o q, then q", ti o q in M implies q' in M. This implies 
q' o a = q" otioqo a = q" o ti o t o p = p" o p = p' . Hence we have in contradiction 
to 77 1= NC(a) p 1 and q' in M with q' o a = p' . For 7 = 0, to* T(NC(a)) 
implies existence of p" £ M with p" o s / to*. But this contradicts 77 |= NC(a). 

3. Let 7 = 0. Then to* |= T(NC(a)) <t=> true because T(NC(a)) = true in 
this case. The claim 77 NC(a) leads according to part 1 of the proof to 7 ^ 0 
which contradicts 7=0. Hence we have 77 |= NC(a). This means we have for 
7 = 0 to* |= T(NC(a)) 77 |= NC(a) true. 

Remark. The proof of lemma 2 does not require that a is in M . If a is not in 
M we have again 7 = 0 (as in remark after lemma 1). 

Proof of Lemma 3. Let G => 77 be any direct derivation. 

Case 1. The pair (r: K — > R,x:R — > X) has no puslrout complement. Then 
p -1 ( acc r) = true and to | = p _1 ( &cc. R ). We have to show to* |= acc^. This is 
true, because there is no p: X — > 77 with p £ M and pox — to*. Otherwise, 
since the pair (r, to*) has a puslrout complement, the pair (r, x) would have a 
puslrout complement in contradiction to case 1 (pushout-pullback decomposi- 
tion, r,p £ M). 

Case 2. The pair (r: K — > R, x: R — > X) has a puslrout complement and 7 ^ 0. 

Case HI. to (= p -1 ( acc/j). We have to show to* |= acc i.e. given a morphism 
p: X — > 77 in M with p o x = to* we have to find an i £ I and a morphism 
q: Ci — » 77 in A7 with qo Xt = p. From the double puslrout for G => p ,m,m* 77 
and p o x = to* we obtain the following decomposition in pushouts (1), (2), 
(5), (6): First (5) is constructed as pullback of p and d\ leading to pushouts (1) 
and (5) (pushout-pullback decomposition lemma, r, p in M), with same square 
(1) as in the construction because of uniqueness of puslrout complements for 
M-morphisms. Then (2) is constructed as puslrout and we have p':Y — » G with 
p' oy = m and puslrout (6) induced by the pushouts (2) and (2) + (6). Since 
p is in M, z and p' are in M (M-morphisms are closed under pullbacks and 
pushouts) . 

In the case I' = 0 we have no p: X — » 77 with p £ M and pox = to*, because this 
would imply p'-.Y — > G with p' € M and p' o y = to violating to |= p _1 ( a.cc R ). 
Having no p with po x = to*, however, implies to* |= a,cc R . In the case I' ^ 0 
we have by to |= p _1 ( acc#) an i £ I' C 7 with yp.Y —> Di and q'\ Di — > G in 
M with q' o yi = p' . Now we are able to decompose pushouts (6) and (5) into 
pushouts (4)+(8) and (3)+(7) respectively using the same technique as above 
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Fig. 5. Decomposition of pushouts. 



now from left to right (pushout-pullback decomposition, /*, q' £ M) leading to 
a morphism q:Ci — > 77 in M with q o Xi = p. This implies m* \= acc#. 

Case 2.2. m* \= acc#. We have to show to |= p~ l { acc#). Due to case 2 we have 
p~ 1 ( acc #) ^ true. Hence for each morphism p':Y — > G in M with p' o y = m we 
have to find an i £ 7' and a morphism q ' : Dj — > G in M with q' oyi = p' . Given 
a morphism p' in M with p' o y = m we can construct pushouts (1), (2), (5), 
(6) as above, where this time we first construct (6) as pullback leading in the 
right-hand side to a morphism p: X —* 77 in M with pox = to* . Now to* |= acc# 
implies the existence of an* £ I and a morphism q:Ci — > 77 in M with qoxi = p. 
Due to pushout (5) the pair ( r*,p ) has a pushout complement, so that this is 
also true for xp. X — > Ci with q o Xi = p. Hence we have an * £ I' and can 
decompose pushouts (5) and (6) into pushouts (3)+(7) and (4)+(8) from right 
to left leading to a morphism q': Di — > G in M with q' o = p' . This implies 
to |= p _1 ( acc fi ). 

Case 3. The pair (r,x) has a pushout complement, but 7 = 0. 

Case 3.1. to* acc# = NAC(a;) implies p £ M with po x = to*. As shown in 
case 2.1 we obtain p' £ M with p' o y = m which implies to |A NAC(y). 

Case 3.2. to p -1 ( acc^) = NAC(p) implies in a similar way to* NAC(a;) 
using the construction in case 2.2. 

Proof of Lemma 4. Let G => H be any direct derivation. We have to show 

p,m,m* 

(*) to 1= acc/, m* |= acc_R. 

Case 1. 7 = 0. Then acc^ = accz, = true which implies (*). 

Case 2. I yf 0 and (r, x) has no pushout complement then acc^ = true. We have 
to show to* |= acc#. Assume to* acc^. Then there is p £ M with pox = to*. 
Since (r, to*) has a pushout complement the pushout-pullback decomposition 
lemma with r,p £ M implies that also (r,x) has a pushout complement. Con- 
tradiction. Hence to* |= acc#. 

Case 3. Let 7 = 0 and I' = 0 and (r, x) has a pushout complement. In this case 
we have acc/, = true and we have to show to* |= acc#. Assume m* acc#. 
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Then there is p £ A7 with p o x = to* and i £ 7, q% £ M with qi o Xi = p. This 
implies m* = gtoxiox. Since (r, m*) has a pushout complement and r, qi, Xi £ M 
the pushout-pullback decomposition lemma implies that also ( r*,qi o Xi) and 
(r*,Xi) has a pushout complement. Contradiction to I' = 0. Hence to* |= acc#. 



Case 4- Let 7^0 and I' = 0 and (r, x) has a pushout complement. In this case 
we use the following negations: 



3 p: X — > H , p £ M, p o x = m* and 3i £ I 3 qp Ci — *• H, qi £ M , qi o Xi = p. 

Case C-b m P _1 ( acc fl) — acc L an d we have to show m* acc/j. By 
to acc/, we have p' £ M, * £ I' and q[ G M as given in a). From the 
double pushout of G H and m = p' o y with p' £ M we can construct 

pushouts (2), (6), (1), (5) in figure 5 by the pushout-pullback decomposition 
lemma with l,p' £ M leading to the commutative diagram in figure 5 with 
p £ M, pox = to*. Using q[oy i = p' we are able to decompose the pushouts (6) 
and (5) to pushouts (4), (8) and (3), (7) (by the pushout-pullback decomposition 
lemma with l*,q[ £ M) leading to morphisms qp. Ci — > H , qi £ M with qiOXi = p 
(see figure 5). This implies to* acc/j as given in b). 

Case 4-%- Let to* acc# and we have to show to acc l- By to* acc/j 
we have p £ M, i £ 7, and qi £ M as given in b). From the double pushout of 
G H and to* = p ox with p £ A7 we can construct pushouts (1), (5), 

(2), (6) in figure 5 leading to p £ M with and p o x = m*. Using qi o Xi = p we 
are able to decompose the pushouts (5) and (6) to pushouts (3), (7) and (4), (8) 
in figure 5 with q' £ M and q' o yi = p' . This implies to \/= acc# as given in a). 
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Abstract. The ability of applications to dynamically discover required 
services is a key problem for Web Services. However, this aspect is not 
adequately supported by current Web Services standards. It is our ob- 
jective to develop a formal approach allowing the automation of the 
discovery process. The approach is based on the matching of requestor’s 
requirements for a useful service against service descriptions. 

In the present paper, we concentrate on behavioral compatibility. This 
amounts to check a relation between provided and required operations 
described via operation contracts. Graph transformation rules with pos- 
itive and negative application conditions are proposed as a visual formal 
notation for contract specification. We establish the desired semantic 
relation between requestor and provider and prove the soundness and 
completeness of a syntactic notion of matching w.r.t. this relation. 



1 Introduction 

The Web Services platform provides the means to adopt the World Wide Web 
for application integration based on standards for communication, interface de- 
scription, and discovery of services. The prosperity of this technology strongly 
depends on the ability of applications to discover useful services and select those 
that can be safely integrated with existing components. Much work has been 
done to achieve this aim. The interface of an offered service can be specified in 
the Web Service Description Language (WSDL). This specification along with 
some keywords characterizing the service can be published at a UDDI-registry 
which serves as a central information broker and supplies this information to 
potential clients. However, current techniques do not support the automation of 
checking behavioral compatibility of the requestor’s requirements with service 
descriptions. 

In our work the compatibility of provided and required services is defined 
via the compatibility of operations constituting the service interfaces: For all 
required operations it is necessary to find structurally and belraviorally com- 
patible provided operations. Structural compatibility requires a correspondence 
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between provided and required operation signatures. This can be checked us- 
ing techniques developed for retrieving functions and components from software 
component libraries [19]. 

We shall concentrate on the second problem - behavioral compatibility. Ser- 
vice requestor and provider specify the behavior of operations by means of con- 
tracts [7], classically expressed in some form of logic. Instead, we propose to 
use graph transformation rules with positive and negative application condi- 
tions for this purpose. They have the advantage of blending intuitively with 
standard modeling techniques like the UML, providing contracts with an oper- 
ational (rather than purely descriptive) flavor. 

Since our approach is presented at the level of models, it can be used for 
different target technologies and languages, provided that they implement a 
Service-oriented Architecture (SOA) . Web services represent the most prominent 
target platform, but there is no direct link of our approach to, say, XML-based 
languages. However each practical realization of the theoretical concepts requires 
a mapping from model-level concepts to a concrete technology. While developing 
such a mapping is outside the scope of this paper, we will briefly discuss the steps 
to be taken in the conclusion. 

The classical interpretation of rules (based, e.g., on the double-pushout 
(DPO) approach to graph transformation [6]), however, is not adequate for con- 
tracts. It assumes that nothing is changed in the transformation beyond what 
is explicitly specified in the rule. A contract, however, represents a potentially 
incomplete specification of an operation. Graph transitions have been proposed 
to provide a looser interpretation of graph transformation rules. The double- 
pullback (DPB) approach [11] defines graph transitions as a generalization of 
DPO transformations, allowing additional changes that are not encoded in the 
rules. 

Moreover, in order to increase the expressiveness of our graphical contract 
language, we consider rules with positive and negative application conditions. 
Negative conditions are well-known to increase the expressive power of rules [9] . 
In the classic approaches, positive application conditions can be encoded by 
extending both the left- and the right-hand side of a rule by the required ele- 
ments: they become part of the context. This is no longer possible, however, in 
the presence of unspecified effects. In fact, the implicit frame condition, that all 
elements present before the application that are not explicitly deleted are still 
present afterwards, is no longer true. Thus, an element matched by a positive 
application condition may disappear, while an element which is shared between 
the left- and the right-hand side must be preserved. 

Based on the notion of graph transition we will define an operational under- 
standing of what it means for a provider rule to match the requestor’s require- 
ments. This shall be captured in a semantic compatibility relation. Since such 
a relation, being based on an infinite set of transitions, can not be computed 
directly, we introduce a syntactic matching relation which provides necessary 
and sufficient conditions at the level of rules. 
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After presenting in the next section the basic ideas of service specification, 
in Section 3 we will discuss the service specification matching, concentrating 
on the compatibility between provided and required operation contracts. The 
formalization of these notions, together with a theorem ensuring the soundness 
and completeness of the syntactic relation w.r.t. the semantic one, will be given 
in Section 4. In Sections 5 and 6 we discuss related approaches, conclude, and 
list issues for future work. 



2 Service Specification 

In this section we consider the basic ingredients of service specifications and 
introduce a simple scenario of a Web service for booking a hotel room, which 
may be required, e.g., for a travel booking system. The scenario is not intended to 
be complete, but it keeps step with standardization efforts in the travel industry 
(see, e.g., [1]) and allows to exemplify main ideas of our approach. 

We start with the data model of the sample scenario expressed by the UML 
class diagram in Fig. 1: A customer (class Customer) intends to book a room 
(class Room) in a hotel (class Hotel). A booking information (class Bookinglnfo) 
and possibly a business license code (class Licenselnfo) of a customer (e.g. travel 
bureau) are required to make a reservation. The result of a booking process is 
represented by an acknowledgment in the form of a reservation tag (class Re- 
servTag) and/or a reservation document (class Reservation) containing all details 
of the reservation. To avoid additional complications, we assume that service 
requestor and provider are working with the same data model, agreed upon in 
advance. 




Fig. 1. Data model of the hotel reservation scenario. 



According to [14], a Web service is represented by an interface describing a 
collection of operations that are network-accessible through standardized XML 
messaging. An example of provided and required interfaces in UML notation is 
given in Fig. 2. 

An interface contains only structural information about operations. The be- 
havior of these operations shall be specified by contracts. A contract consists of 
a pre-condition specifying the system state before some behavior is executed and 
a post-condition describing the system state after the execution of the behavior. 
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«interface» 

Providedlnterface 

reservHotel(cus:Customer, b:Bookinglnfo):(ReservTag, Reservation) 
changeReserv(rt:ReservTag, bi:Bookinglnfo):Reservation 
cancelReserv(rt:ResrvTag) 



«interface» 

Requiredlnterface 

bookHotel(c:Customer, bi:Bookinglnfo, l:Licenselnfo):ReservTag 
rejectBooking(rt:ReservTag) 



Fig. 2. Provided and required interfaces. 

There are different approaches employing formal techniques (e.g., description 
logic [15], algebraic specification languages [20], etc.) to contract specification 
(see Section 5). The main obstacle of these approaches is their lack of usability 
in the software industry, where knowledge and skills in the application of logic 
formalisms are scarce. Instead, we seek a notation that is close to the standard 
software modeling languages (e.g., UML) and has, at the same time, a formal 
background allowing to provide automation. This visual formal notation for con- 
tracts is introduced by typed graph transformation [3]. 

In this context, a class diagram is considered as a directed graph, whose ver- 
tices contain type declarations. Their relation with object diagrams representing 
run-time states is expressed by the notion of a type graph ( TG ) and correspond- 
ing instance graphs [3]. A graph transformation rule p : L =>■ R consists of a 
pair of TG-typed instance graphs L,R with compatible structure, i.e., such that 
edges that appear in both L and R are connected to the same vertices in both 
graphs, vertices with the same name have the same type, etc. The left-hand side 
L denotes the pre-condition of the rule and the right-hand side R represents the 
post-condition (cf. the part of Fig. 3 marked by the dashed rectangle). The effect 
encoded in the rule is defined by the items which have to be deleted (exist only 
in L), created (exist only in i?), and preserved (exist in both L and R). 

In addition, rules are equipped with positive and negative application con- 
ditions specifying required and forbidden contexts. A positive application con- 
dition contains elements whose presence is required by the rule, but without 
specifying wether these elements shall be preserved or deleted. This is possible 
because, seeing contracts as incomplete specifications of operations, we adopt 
a loose semantics for rules which permits unspecified changes. Negative ap- 
plication conditions represent structures which must not be present when the 
rule is applied. Formally, a graph transformation rule with application condition 
p : {Lp,Ln} D L => R contains, in addition to TG-typed instance graphs L 
and R, graphs L P / N specifying extensions of L by the required or forbidden 
elements. 

Fig. 3 shows the graph transformation rules for the required operation 
bookHotel() and the provided operation reservHotelQ. L rP and L pP represent 
the patterns needed for the rule application along with the input parameters of 
the operations: information about a customer (verticies c and cus) and book- 
ing details (verticies bi). While the graphs L pP and L p in the provider rule are 
identical, the graph L rP in the requestor rule additionally contains the vertex 
ILicenselnfo denoting a business license code of the customer. This vertex being 
the input parameter of the operation bookHotel() is required to be present, but 
does not participate in the following computation. 
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The negative patterns of both rules L t n and L p n prevent us from booking a 
room which is already reserved by another customer. The reservation of a room 
is shown by newly created vertices res, rs, rt and the corresponding edges between 
them in the right-hand sides of the rules. The extra association sent in the lower 
rule reflects the fact that the reservation document is sent to the customer. 



bookHotel(c:Customer, bi: Bookinglnfo, !:Licenselnfo):ReservTag 



Ilicenselnfc 




ha 


B> 
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provide pbiTBo 0 kin a lnfa|l* - 
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reservHotel(cus:Customer, bi:Bookinglnfo):(ReservTag, Reservation) 
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rnrRoom 
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Fig. 3. Transformation rules on object graphs for required operation bookHotel() and 
provided operation reservHotel(). 



To summarize, a service specification consists of a data model, structural (op- 
eration signatures) and behavioral (operation contracts) specifications of opera- 
tions constituting a service. In the next section we discuss specification matching 
and consider an example of matching required and provided operation contracts. 

3 Specification Matching 

In general, specification matching has to deal with all three aspects of specifi- 
cations, i.e. , data types, signatures, and contracts. For simplicity, we ignore the 
matching of data models and discuss the matching of signatures only briefly 
(see [19] for a general discussion). 



Specification Matching of Web Services 309 



As an example, consider the relation between the required operation 
bookHotel() and the provided operation reservHotel() whose signatures and con- 
tracts are depicted in Fig. 3. The signatures of the two operations differ in several 
ways. The required operation has an extra input parameter l:Licenselnfo, while 
the provided operation contains an extra output parameter of type Reservation. 
This does not contradict their compatibility, because any input of the requestor 
which is not required may simply be ignored by the provider. Similarly, any 
unexpected output of the provider may be skipped by the requestor. 

To determine the relation between signatures and contracts, we require that 
input and output parameters of each operation are represented by vertices with 
corresponding types in the rules. These dependencies are indicated by the dashed 
arrows in Fig. 3. 

Now we consider behavioral compatibility, i.e., the compatibility of pre- 
conditions and effects. Pre-conditions are captured by the patterns Lp and 
Lpr of positive and negative constraints. In order to perform an operation, the 
provider requires input data from the requestor satisfying certain conditions. 
In the provider rule of Fig. 3, the input data consists of customer and booking 
information. The requestor has to be prepared to deliver this data and to guar- 
antee the required properties. This can be expressed by a graph homomorphism 
from L p p to L r p. 

The restrictions towards the applicability of the provided operation are also 
described via undesired patterns of negative constraints. If the provided opera- 
tion imposes more restrictive constraints than this is anticipated by the required 
one, this represents an unexpected limitation. To avoid this situation, one should 
check the existence of a graph homomorphism from L r pi to L p jy. 

A requestor expects some clearly specified benefit form the invocation of a 
service. The effect of the provided operation must not be smaller than specified 
by the requestor. That means, the requestor rule p r : L r => R r must be embed- 
ded in the provider p p : L p => R p as it is the case with the rules in Fig. 3. For 
example, the operation reservHotelQ additionally creates the edge sent, denoting 
the delivery of a reservation document to the customer. This vertex is not pre- 
sented in the requestor contract, because it is regarded as sufficient to obtain a 
reservation tag. Nevertheless, the effect of the provided operation fits the clients 
requirements. 

Next, we will present a formalization for the intuitive ideas obtained from 
the example. 

4 Formalization 

Contract matching can be formalized as a relation between graph transformation 
rules. In this section, we define two such relations, a semantic one based on the 
operational interpretation of rules, and a syntactic one which provides necessary 
and sufficient conditions for the semantic relation. First, however, we review 
some of the basic notions of the double-pushout (DPO) [6] and double-pullback 
(DPB) [11] approach to graph transformation. 
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4.1 Graph Transformation 

Given a graph TG, called type graph , a TG-typed (instance) graph consists of a 
graph G together with a typing homomorphism g : G — » TG (cf. Fig. 4 on the 
left) associating with each vertex and edge x of G its type g(x) = t in TG. In 
this case, we also write x : t £ G. A TG-typed graph morphism between two 
TG-typed instance graphs (G, g) and (H,h) is a graph morphism / : G — » H 
which preserves types, that is, h o f = g. 

The DPO approach to graph transformation has originally been developed 
for vertex- and edge-labelled graphs [6] . Here, we discuss the typed version [3] . 
Graph transformation rules, also called graph productions, are specified by 

l r 

pairs of injective graph morphisms (L < — K — > R), called rule spans. The left- 
hand side L contains the items that must be present for an application of the 
rule, the right-hand side R those that are present afterwards, and the interface 
graph K specifies the “gluing items”, i.e. , the objects which are read during 
application, but are not consumed. 



/ 

G G H L ^ — K — - — > R 



9 \ / d L 

9 \ // h 


( 1 ) 


d K ( 2 ) 




- 


- 



Si k 

TG TG G <- — D — — 1 H 

9 h 



Fig. 4. Typed graph and graph morphism (left) and double-pushout (or -pullback) 
diagram (right). 



Definition 1. (rule, graph transformation system) A rule span typed over 

TG, in short. TG-typed rule span, s = (L K —> R ) is a span of injective 
TG-typed graph morphisms. 

A graph transformation system GTS = (TG,P,n) consists of a type graph 
TG, a set of rule names P, and a mapping it associating with each rule name p 
a TG-typed rule span tt (p). If p £ P is a rule name and n(p) = s, we say that 
p : s is a rule of GTS. 

In DPO, transformation of graphs is defined by a pair of pushout diagrams, 
a so-called double-pushout construction (cf. Fig. 4 on the right). Operationally 
speaking that means: the elements of G matched by L \ l(K) are removed, and 
a copy oi R\ r(K) is added to D. 



4.2 Graph Transitions 

The DPO approach ensures that the changes to the given graph H are exactly 
those specified by the rule. However, operation contracts represent specifications 
that are, in general, incomplete, that is, additional effects should be allowed 
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in the transformation. Therefore, a more liberal notion of rule application is 
required which ensures that at least the elements of G matched by L \ l(K) 
are removed, and at least the elements matched by R \ r(K) are added. This 
interpretation is supported by the double-pullback (DPB) approach to graph 
transformation [11], which generalizes DPO by allowing additional, unspecified 
changes. Formally, graph transitions are defined by replacing the double-pushout 
diagram of a transformation step with a double- pullback. 

Definition 2. (graph transition) Let p : s be a rule span with s = (L 

K R). Then, a graph transition from G to H via p, denoted by G ^ H, 
is a diagram like the right part of Fig. f where both (1) and (2) are pullback 
squares. A graph transition (or briefly transition) is called injective if both g and 
h are injective graph morphisms. It is called faithful if it is injective, and the 
morphisms dL and da satisfy the following condition: for all x, y € L, y £ l(K) 
implies d^x) dL^y), and analogously for dn 1 . 

Each faithful transition can be regarded as a transformation step plus a 
change-of-context [11]. This is modelled by additional deletion and creation of 
elements before and after the actual step. 

4.3 Graph Transitions with Application Conditions 

Using positive and negative application conditions [9], the embedding of the 
left-hand side of a rule in a given graph can be restricted, thus increasing the 
expressiveness of the formalism. 

Definition 3. (rules with application conditions) 

An application condition A(p) = (AP(p), AN(p)) for a graph transformation 
rule p : s = (L K — R) consists of two sets of typed graph morphisms 
AP(p),AN(p) C A 401Z(L) starting from L, that contain positive and negative 
constraints, respectively. A(p) is called positive (negative) if AN (p) (AP(p)) is 
empty. 

Let L L be a positive or negative constraint and L —P> G a typed graph 

morphism (cf.Fig. 5). Then dL P-satisfies l, if there exists a typed graph mor- 
+■ ^ 
phism L — > G such that dp o l = dL- dr N-satisfies l, if it does not P-satisfy 

1. 

Let A(p) = (AP(p), AN(jp)) be an application condition and L —P> Q a typed 
graph morphism. Then dL satisfies A(p), if it P-satisfies all positive constraints 
and N-satisfies all negative constraints from A(p). 

A graph transformation rule with application condition is a pair p = ( p , A(p)) 

l r 

consisting of a graph transformation rule p : s = (L < — K — > R) and an 

application condition A(p) for p. It is applicable to a graph G via L —P, G if 
dL satisfies A(jp). 



1 This condition means that dL and dn satisfy the identification condition of the DPO 
approach [4] with respect to l and r. 
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L ^ — L — K T > R 




Fig. 5. DPB graph transition and rule with application condition. 



Two examples of rules with positive and negative constraints are given in 
Fig. 3. 

Definition 4 . (graph transition with application condition) Let p = 

(j>, A{pj) be a graph transformation rule with application condition, where s = 

l r 

( L < — K — > R). A graph transition from G to H via the rule p, denoted 

by G ^ H , is a graph transition via a rule p, such that € d satisfies the 
application condition of p. 

Faithful transitions as introduced in Section 4.2 capture our intuition about a 
loose interpretation of contracts which can be specified by graph transformation 
rules with application conditions. 



4.4 Semantic Compatibility and Syntactic Matching Relations 

The concept of transition allows us to formalize semantically the desired notion 
of compatibility: Provider and requestor rules are semantically compatible if (1) 
every transition via the provider rule can be regarded as a transition via the 
requestor rule, and (2) applicability of the requestor rule implies applicability of 
the provider rule. 

Definitions, (semantic compatibility) Let p\ = (pi,A(p\)) and p 2 = 
(p2,A(p2)) be graph transformation rules with application conditions, where 

Si = {L\ <— C K\ — Ri) and S2 = (T2 K2 — - ^ -R2)- We say that p\, 
is semantically compatible with f>2, in symbols P2 \=Pi, iff 

1. for all spans t = (G D — H) and transitions G P2 -U 2 H, there ex- 

ists a transition G Vl ^> 1 H using the same bottom span t , where d\ = 
(dLn dxn duff) and g?2 = (dL 2 , dR 2 , dR 2 ) ( c f- Fi 9 ■ 6 on the right), and 

2. if dr x : L\ * G satisfies the application condition of p\, then di, 2 '■ L2 — > G 
satisfies the application condition of p 2. 

This definition reflects the desired relation between contracts, but can hardly 
be applied for an algorithm determining contract compatibility. Therefore, we 
introduce a relation of syntactic matching that encompasses ideas presented in 
Section 3 and has more constructive character. 

Definition 6 . (syntactic matching) Let p\ = (p\,A(p\)) and p 2 = 

(p2,A(p2)) be graph transformation rules with application conditions, where 
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Fig. 6. Semantic compatibility and syntactic matching relations. 



Si = (Li <— ! — Ki — Ri ) and S2 = ( L2 — A'2 i?2)- VFe sap that pi 

syntactically matches with p 2 , in symbols p 2 I - Pi, iff 

1 . there exist graph homomorphisms Hl '■ L\ — > L2, hx '■ K 1 — » K2, and 
Hr : i?i — > i?2 swc/i that the diagrams (a) and (b) represent a faithful tran- 
sition (cf. Fig. 6 on the right), and 

2 . (a) for all I2 : L2 — ■> L2 € AP(p 2) i/iere ea;ist Zi : Li — > L\ £ AP(pi) and a 

graph homomorphism h^ p : L2 — > L\ such that the corresponding square 
in Fig. 6 on the left commutes, and 

(b) for all k\ : L\ — > L\ £ AN(p 1) there exist k2 '■ L2 — » L2 £ AN(p2) and a 
graph homomorphism : L\ — > L 2 such that the corresponding square 
in Fig. 6 on the left commutes. 

A11 example of syntactic matching is given in Section 3 for the graph trans- 
formation rules specifying the contracts of the required operation bookHotelQ 
and the provided operation reservHotel(). 

Next, we present a theorem ensuring the soundness and completeness of our 
approach. 

Theorem 1 . (soundness and completeness of matching) Assume two 
graph transformation rules with application conditions p\ and p 2 ■ Then p 2 h Pi 
if and only if p 2 \= Pi- 

Proof Sketch. Soundness: Assume two graph transformation rules with applica- 
tion conditions pi and p 2. We show that P2 pi implies P2 \= Pi, he., Def. 6 
entails Def. 5 . 1 / 2 , respectively. 

1 . It is necessary to demonstrate that for each faithful transition via the second 
rule there is a faithful transition via the first rule. By assumption, there exist 
graph homomorphisms between the first and the second rule (h l, hx, /ir), 
forming a faithful transition (cf. Fig. 6 on the right). Now, both transitions 
can be vertically composed using the composition of the underlying pull- 
back squares. The faithfulness of the composed transition follows from the 
preservation of the identification condition under the composition of pullback 
squares. 
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2. We have to demonstrate that if satisfies the application condition of pi, 
then satisfies the application condition of p 2 . This induces two problems 
(cf. Def. 3): 

(a) (cf. Fig. 6 on the left) must P-satisfy all positive constraints of p 2 - 
Since hf p exists by assumption (Def. 6.2(a)), d^ 2 can be constructed by 
di i o hi . It is not difficult to see that d^ 2 o l 2 = dL 2 . 

(b) d^ 2 (cf. Fig. 6 on the left) must N-satisfy all negative constraints of p 2 , 
i.e. , there does not exist d^ 2 : L 2 — * ► G. This can be proved by assuming 
existence of d^ 2 and showing a contradiction. 

Combining (a) and (b), we obtain that dz, 2 satisfies the application condition 
of f>2- 

Completeness: Assume two graph transformation rules with application condi- 
tions p\ and f> 2 - To prove p 2 |= Pi implies p 2 F pi, i.e. Def. 5 entails Def. 6.1/2, 
respectively. 

1. To show that there exist graph homomorphisms between the first and the 
second rule, we can apply p 2 to the graph L 2 at the identity mapping. If 

P 2 is applicable, we can create a (faithful) transition L 2 i? 2 with the 
span t = (L 2 K 2 i? 2 ). Consequently (see Def. 5.1), there exists a 

faithful transition L 2 P ^> h f? 2 via the first rule using the same bottom span 
t. If we can not apply p 2 to the graph L 2 , then the premise is false, and the 
conclusion is trivially true. 

2. Two questions have to be considered for the second part of Def. 6: 

(a) Existence of a graph homomorphism hi : L 2 — * L\ between the graphs 
representing the patterns for positive constraints of the rules. We can 
apply pi to the graph Li at dj^ := 1 1 £ AP{pi). Since d^ satisfies 1 1, 
there exists di 2 satisfying the constraint / 2 € AP(p 2 ) ofp 2 (see Def. 5.2). 
That implies existence of a graph homomorphism hi p : I/ 2 — » Li. The 
commutativity of the corresponding square in Fig. 6 on the left comes 
out of the commutativity of the diagrams (1),(2),(3), and (4). The com- 
mutativity of the diagrams (1) , (2) , and (3) can easily be shown. To prove 
that the diagram (4) commutes one has to assume existence of a typed 
graph morphism to : Li — > Li := dL 2 0 h^ 7^ dL 1 (= h) and find out a 
contradiction. 

(b) Existence of a graph homomorphism hi : Li — » L 2 between the graphs 
representing patterns for negative constraints of the rules. We can apply 
Pi to the graph L 2 at some dj^. By Def. 5.2 if d^ x satisfies ki £ AN(pi), 
then d^ 2 satisfies fc 2 £ AiV(p 2 ). We can reformulate this as: if di 2 does 
not satisfy fc 2 , then dL 1 does not satisfy hi. 

Now we try to apply p 2 to the graph L 2 at di 2 := /c 2 . It is possible to 
see that the premise of the statement above is true (d^ 2 does not satisfy 
k 2 ), so is the conclusion, i.e., d^ 1 does not satisfy ki. This may happen 
only if there exists a graph homomorphism hi : Li — > L 2 , which was 
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required to be proved. The commutativity of the corresponding square 
in Fig. 6 on the left comes out of the commutativity of the diagrams 
(1),(2),(3), and (4). The only problem here is the commutativity of the 
diagram (4) which can be demonstrated in the same way as in (a). 

Combining two parts of the proof we can conclude that p 2 b pi if and only 
if p 2 b Pi- 

Two final sections discuss approaches related to our work and summarize the 
main results. 

5 Related Work 

The problem of discovering a component or service satisfying specific require- 
ments is not a new one. A significant amount of work has been done in the area 
of Component-Based Software Engineering (CBSE) to increase the reliability 
and maintainability of software through reuse. Central here is the development 
of the techniques for reasoning about component descriptions and component 
matching. These techniques differ in the constituents involved in the match- 
ing procedure (e.g., operation signatures, behavioral specifications) and the way 
these constituents are specified (e.g., logic formulas, algebraic specification lan- 
guages). 

One of the most elaborate approaches, along with a thorough overview of 
related work, is presented by Zaremski and Wing in [19] and [20], who have 
developed matching procedures for two levels of specifications semantic infor- 
mation (signatures and specifications) and two levels of granularity (functions 
and modules) . Structural and behavioral information about components is given 
using the algebraic specification language Larclr/ML. 

A pre/post-condition style of specification, like in [20], is also utilized by 
other authors. For example, in the work of Perry [17] operations are specified 
with pre- and post-conditions in first order logic. Order-sorted predicate logic 
(OSPL) is employed by Jeng and Cheng in [13] for component and function 
specifications. Basically, two features differentiate our approach from the works 
described above. The first one is the operational interpretation of graph transfor- 
mation rules. Second, we have proposed a visual, model-based approach which 
provides better usability, because it can be more easily integrated into the stan- 
dard model-driven techniques for software development. 

Matching required and provided interfaces is also an issue present in mod- 
ularization approaches for algebraic specification languages and typed graph 
transformation systems (TGTS), in particular for the composition of modules. 
An algebraic module specification MOD in [5] consists of four parts called im- 
port IMP , export EXP , parameter PAR , and body BOD. All components are 
given by algebraic specifications, which are combined through specification mor- 
phisms. TGTS-modules in [8] are composed of three TGTS, IMP , EXP, and 
BOD, the only difference being the absence of a parameter part. IMP and BOD 
are related by a simple inclusion morphism, whereas EXP and BOD are con- 
nected by a refinement morphism, allowing a temporal or spatial decomposition 
of rules. 
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Composition of modules MOD\ and MOD 2 is based on the morphisms con- 
necting the import interface of MOD\ with the export interface of MOD 2 - 
Relating required (import) and provided (export) services, this morphism has 
a similar role like the matching relation in this paper. A detailed comparison 
with [5] is hampered by the conceptual distance between the two formalisms, 
i.e., graph transformation and algebraic specifications. 

The interconnection mechanism in [8] is incomparable to our notion: On the 
one hand, the cited paper allows an extension and renaming of types, an issue 
that we did not discuss at all in this paper. On the other hand, the relation 
between rules is more general in our case, i.e., it is the most general notion 
allowing the entailment of applicability from import to export as well as the 
entailment of effects in the opposite direction. 

Finally, let us mention several approaches in the Semantic Web context. 
Paolucci et al. in [16] propose a solution for automation of Web service discovery 
based on DAML-S, a DAML-based language for service description. While re- 
quired and provided service descriptions contain specifications of pre-conditions 
and effects, the matching procedure in [16] merely compares input and output 
parameters of services. Basically, such a kind of matching can be considered as 
an extended variant of the structural compatibility. Sivashanmugan et al. in [18] 
extend WSDL using DAML+OIL ontologies to support semantics-based discov- 
ery of Web Services. The authors emphasize the importance of matchmaking not 
only for input and output parameters, but also for functional specifications of 
operations. Since the work contains only conceptual descriptions of the matching 
procedure, we can not provide a more formal analysis. 

Hausmann et al. in [10] use graph transformation rules defined over a domain 
ontology to represent service specifications and introduce a relation between 
them. The strength of this work is the implementation of the matching proce- 
dure in a prototypical tool chain. Informally, the basic ideas introduced in [10] 
are similar to those of this paper, but there is a number of technical differences. 
While the matching procedure has been precisely defined in a set-theoretic no- 
tation, the authors of [10] did not provide a formal operational semantics along 
with a semantic compatibility relation. As a consequence, the correctness of 
the proposed formalism has been justified only by means of examples. Besides, 
the lack of application conditions limits the expressiveness of contracts to basic 
positive statements. 

Another approach is presented by Pahl in [15]. He proposed to use descrip- 
tion logic for service specification and introduced a contravariant inference rule 
capturing service matching. This approach is closely related to ours because of 
the pre/post style of service specification and the contravariant character of the 
matching. But, as it was correctly stated by the author, the expressiveness of de- 
scription logic has negative implications for the complexity of reasoning. Unlike 
our approach, the service matching in [15] has a problem with decidability that 
can be guaranteed only under certain restrictions (the set of predicates must 
be close under negation). Although, the sub-graph problem, in general, is not 
solvable in polynomial time, there exist a number of heuristic solutions which 
make it appear realistic. 
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6 Conclusion 

In this paper we have developed formal concepts for a UML-based approach 
to service specification and matching based on conditional graph transforma- 
tion rules for modeling operation contracts. We have used a loose interpretation 
of rules via DPB graph transitions to obtain an operational understanding of 
contracts and a corresponding semantic compatibility relation, and we have es- 
tablished a syntactic relation providing necessary and sufficient conditions for 
the semantic one. 

Several issues remain for future work. First, the formal presentation needs to 
be extended to typed graphs with attributes [12] and sub-typing [2]. Second, the 
compatibility relation should be improved to allow the matching of one requestor 
rule against a spatial/temporal composition of several provider rules [8]. 

Third, the practical application of the concepts discussed in this paper re- 
quires a mapping to the Web service platform, consisting of XML-based stan- 
dards like SOAP and WSDL. The first part of this mapping has to relate the 
type systems of both levels, i.e. a type graph of GTS and an XML-schema of 
a WSDL specification. The second part should associate operation signatures 
given by UML interfaces with the corresponding specifications of operations in 
a WSDL document. The last part should provide the mapping of contracts into 
an adequate XML-representation. This should be integrated with WSDL and 
should support the implementation of the corresponding matching procedures. 
While there are isolated examples for all three mappings in the literature, their 
integration yet remains an open issue. 
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Abstract. We show how edge-labelled graphs can be used to represent first-order 
logic formulae. This gives rise to recursively nested structures, in which each 
level of nesting corresponds to the negation of a set of existentials. The model is 
a direct generalisation of the negative application conditions used in graph rewrit- 
ing, which count a single level of nesting and are thereby shown to correspond to 
the fragment 3 — >3 of first-order logic. Vice versa, this generalisation may be used 
to strengthen the notion of application conditions. We then proceed to show how 
these nested models may be flattened to (sets of) plain graphs, by allowing some 
structure on the labels. The resulting formulae-as-graphs may form the basis of a 
unification of the theories of graph transformation and predicate transformation. 



1 Introduction 

Logic is about expressing and proving constraints on mathematical models. As we 
know, such constraints can for instance be denoted in a special language, such as First- 
Order Predicate Logic (FOL); formulae in that language can be interpreted through a 
satisfaction relation. In this paper we study a different, non-syntactic representation, 
based on graph theory, which we show to be equivalent to FOL. The advantage of this 
alternative representation is that it ties up notions from algebraic graph rewriting with 
logic, with potential benefits to the former. 

We start out with the following general observation. A condition that states that a 
certain sub-structure should exist in a given model can often be naturally expressed by 
requiring the existence of a matching of (a model of) that sub-structure in the model in 
question. Illustrated on the edge-labeled graphs studied in this paper: The existence of 
entities x and y related by a(x, y) and b (y, x ) (where a and b are binary relations) can 
be expressed by the requiring the existence of a matching of the following graph: 




Note that this matching only implies that this sub-structure exists somewhere in the 
target graph; it does not say, for instance, that a (a;, y) and b(y, x) always go together, 
or that the entities playing the roles of x and y are unrelated to other entities, or even 
that these entities are distinct from one another. 

One particular context in which matchings of this kind play a prominent role is that 
of algebraic graph rewriting (see [3,8] for an overview). The basic building blocks of 
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a given graph rewrite system are production rules, which (among other things) have 
so-called “left hand side” (LHS) graphs. A production rule applies to a given graph 
only if the rule’s LHS can be matched by that graph; 1 moreover, the effect of the rule 
application is computed relative to the matching. 

The class of conditions that matchings can express is fairly limited. For instance, no 
kind of negative information, such as the absence of relations or the distinctness of x 
and y in the example above, can be expressed in this manner. In the context of algebraic 
graph transformation, this observation has led to the definition of so-called negative 
application conditions (NACs) (see [10]). A NAC itself takes the form of a matching 
of the “base” graph into another. A LHS with NACs, interpreted as a logical constraint, 
is satisfied if a matching of the LHS exists which, however, cannot be, factored through 
any of the NACs. 2 Consider, for instance, the following example of a graph with two 



The base graph (here drawn on top) expresses, as before, that there are two entities, say 
x and y, such that a (x,y) and b (y,x)\ the first NAC adds the requirement that there 
is no z such that c(y, z); and the second NAC adds the requirement that x and y are 
distinct. The combined structure thus represents the formula 



Formally, a graph satisfies this condition if there is a matching of the base graph into it, 
that cannot be factored through a matching of either of the NAC target graphs. 

Although, clearly, NACs strengthen the expressive power of graph conditions, it is 
equally clear that there are still many properties that can not be expressed in this way. 
For instance, any universally quantified positive formula is outside the scope of NACs. 
As a very simple example consider 3x: Vy: x = y expressing that there exists precisely 
one entity. However, we can easily add more layers of “application conditions”. For 
instance, the existence of a unique entity is represented by the following structure; 



1 In the single-pushout approach [8], the existence of a matching is also sufficient; in the double- 
pushout approach [3], there are some further conditions on the applicability of a rule. 

2 An important difference is that, in graph transformation, the issue is not whether a matching of 
a given LHS exists but to find all matchings. Seen in this light, the results of this paper concern 
the applicability of production rules. 



NACs: 




3x, y: a(x, y) A b (y, x ) A ($z : c(y , z)) Ax^y . 
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At this point, readers familiar with the theory of existential graphs (see, e.g., [14,5]) 
may recognize the analogy between this stacking of graphs and the cuts used there to 
represent negation. See also the discussion of related work in Sect. 6. This paper is 
devoted to working out the ensuing theory in the category of edge-labeled graphs, pro- 
viding a direct connection to the setting of algebraic graph transformation. We present 
the following results: 

- Graph conditions with a stack of application conditions of height n, interpreted 
through matchings as sketched above, are expressively equivalent to a fragment of 
(V-free) FOL that may be denoted 3(-i3) n — for a precise statement see Th. 4. It 
is known that a higher value of n gives rise to a truly more expressive fragment of 
FOL; that is, for each n there is a property that can be expressed in 3(— 3) n+l and 
not in 3(A3) n . We prove equivalence through compositional translations back and 
forth. 

As a corollary, NACs carry the expressive power of 3-<3 — which indeed excludes 
universally quantified positive formulae, since those would translate to —3— at the 
least. Another consequence is that more highly stacked graph conditions, providing 
the full power of FOL, can be integrated seamlessly into the theory of algebraic 
graph transformation. In the conclusions we briefly mention two such extensions 
that have been studied in the graph transformation literature. 

- Graph conditions may be flattened, without loss of information, to simple, edge- 
labeled graphs, provided we add structure to the labels to reflect the stack height at 
which the nodes and edges were originally introduced. With hindsight this structure 
on the labels is indeed strongly reminiscent of the cuts in existential graphs, except 
that we avoid the need for representing cuts as an explicit structure on the graphs. 

The remainder of this paper is structured as follows. In Sect. 2 we recall some defini- 
tions. In Sect. 3 we define graph predicates and provide a translation to FOL; in Sect. 4 
we define the reverse translation. Sect. 5 shows how to flatten graph predicates. Finally, 
in Sect. 6 we summarize the results and discuss related work, variations and extensions. 

2 Basic Definitions 

We assume a countable universe of variables Var, ranged overby x , y. z, and a countable 
universe of relation (i.e., predicate) symbols Rel C Lab (not including =), ranged over 
by a. The following grammar defines FOL. the language of first order logic with equality 
and binary relations: 

<j> ::= x = x [ a(x,x) | ~«f> \ \I<P | I 3 X:cj). 

Here ^ C FOL and X C Var are finite sets of formulae and variables, respectively. (So 
3X : d> is not second-order quantification but finitary first-order quantification.) We use 
fv (cp) [jv (<!>)] for (j) £ FOL [<P C FOL] to denote the set of free variables in 0 [<l>\ (with 
the usual definition). Note that all sets of free variables are finite. 

As models for FOL we use edge-labeled graphs without parallel edges. For this 
purpose we assume a countable universe of nodes Node, ranged over by v, w, and a 
countable universe of labels Lab D Rel, ranged over by i. Except in Sect. 5, we will 
have Lab = Rel. 
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Definition 2.1 (graphs). 

- A graph is a tuple G = (N,E) where N C Node is a set of nodes and E C 
N x Lab x N a set of edges. 

- If G and H are graphs, a graph morphism from G to H is a tuple p = (G, //. /) 
where f: Ng — > iV# is such that (f(v),£, f(w)) £ Eh for all (v, l , w) £ Eg- 

- The category of graphs, denoted Graph, has graphs as objects and graph mor- 
phisms as arrows, with the obvious notions of identity and composition. 

We denote the components of a graph G by Nq and Eg (as already seen above), and 
we use p:G^H or p £ Graph(G, H) to denote that p is a morphism from G to H. For 
p:G H we denote src(p) = G and tgt(p) — H. The following result is standard. 

Proposition 1. Graph is a cartesian closed category with all limits and colimits. 

Every countable set A gives rise to a discrete graph (.A), with N = A and E = 0. We 
also use (v A w) to denote the one-edge graph with N = {v, tu} and E = { (t;, a, u>)}. 
Furthermore, for every X C Y C Var (Y countable) we use ernb[X. Y ] = (X, Y, idx ) 
for the morphism that embeds (X) in (Y), and for every G £ Graph we write idc for 
the identity morphism on G. The following rules define a modeling relation |= between 
graph morphisms 9 £ Graph ((X),G) with X D fu(<f>) (which combine the valuation 
of the logical variables in X with the algebraic structure of F) and FOL-formulae ci>: 

9 \= x = y if 9{x) = 9(y) 

9 1= a (x, y) if (9(x), a, 9(y)) £ E G 

6 \= -xj) if 9 ^ f 

9 \=\J<P if 9 \= (j) for some <f> £ <P 

9^ f\d> if 9 \= (j> for all cf> £ <P 

9 \= BY: (j> if T) |= (j> such that 9 = yo emb[ X, X UY] . 

3 Graph Predicates and Conditions 

For finite G £ Graph we define Pred[G], Cond[G] as the smallest sets such that 

- p C Cond[G] with p finite implies p £ Pred[G]; 

- a £ Graph {G,H) andp £ Pred[fT] implies (a,p) £ Cond[G]. 

The elements of Pred[G] are called (graph) predicates on G and those of Cond[G] 
(graph) conditions on G. Graph predicates can be thought of as finitely branching trees 
of connected graph morphisms, of finite depth; a graph condition is a single branch of 
such a tree. For c £ Cond[G] we write a c for the morphism component, T c = tgt(a c ) 
for the target of a c and p c for the predicate component; hence c = (a c ,p c ). The depth 
of predicates and conditions, which is quite useful as a basis for inductive proofs, is 
defined by: 



depth (p) = ma x {depth (c) \ c £ p} 
depth(c) = 1 + depth(p c ) . 
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Fig. 1. Graph predicates for lt(*, y) V It (y, x) resp. 3 z: lt(cc, z) A lt(z, y) 




Fig. 2. Graph predicate for 3 y: next(a;, y) A Vz: (next(®, z) => z = y) 

It follows that the base case, depth(p) = 0, corresponds to p = 0. Conditions have 
positive depth. We propose Pred [(A')] as representations of FOL formulae over A. Note 
that in the introduction we discussed predicates consisting of a single condition only, 
and in the pictorial representation we omitted the source graph ( X ) (which anyway 
would be empty since the constraints discussed there are closed) and only displayed the 
structure from T c onwards. Fig. 1 depicts two constraints with free variables accurately; 
Fig. 2 is another example, which shows multiple levels of conditions. The following 
defines satisfaction of a predicate p £ Pred[G], for arbitrary 9 £ Graph(G, H ): 

9 1= p iff 3c G p: T c — > H: 9 = p o a c , p p c . ( 1 ) 

On the model side this generalizes |= over FOL: here the source of 9 can be an arbitrary 
graph, whereas there it was always discrete. An example is given in Fig. 3, which shows 
a model of the right-hand predicate of Fig. 1 . 




Fig. 3. Model satisfying the graph predicate for 3z: It (a:, z) A lt(z, y) 
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It follows that a predicate p should be seen as the disjunction of its conditions c £ p, 
and a condition c as the requirement that the structure encoded in T c is present in the 
model, combined with the negation of p c . This interpretation is formalized by recur- 
sively translating all p £ Pred[G] to formulae <j> p . For the translation, we assume that 
for all (sub-)conditions d occurring in the structure to be translated, the node sets of 
src[a.d) and tgt(ad ) are disjoint. (We show below that this assumption does not cause 
loss of generality, since we can always fulfill it by choosing isomorphic representatives 
of predicates and conditions.) Furthermore, we assume that for every node v occurring 
anywhere in the structure to be translated, there is a distinct associated variable x v . 

<t>P = V {4>c | c £ p} 

(j) c = 3{x v | v £ N Tc }'- A { x v = x a c ( v ) | v £ N g } A 

/\{a(x v ,x w ) | (v,a,w) £ E Tc } A ->^ Pc . 

For every graph K occurring in p let X K = {x v \ v £ Nk} and let £k- (Xk) — > K be 
given by x v i — > v for all v £ Nk- The following theorem is one of the main results of 
this paper. 

Theorem 1. Let p £ Pred[G] and 6 £ Graph (G, H); then 0 (= p iff 9 o £<3 \= 4> P - 

Proof sketch. We prove the theorem together with an auxiliary result about conditions. 
First we extend the modeling relation to conditions, and we simplify the definition of 
|= over Pred, as follows: 

9 \= c iff 3fi: T c — > H: 9 = p o a c , p Pc (2) 

6 |= p iff 3c £ p: 9 |= c . (3) 

It follows that 9 \= p iff 9 |= c for some c £ p. The proof obligation is extended with: 

If c £ Cond[G] then 9 \= c iff 9 o |= <j> c . 

The proof proceeds by mutual induction on these cases and on the depth of conditions. 

□ 



4 Formulae as Graph Predicates 

We now provide the inverse translation from formulae into graph predicates. For this 
purpose we need some constructions over predicates. First, for // £ Graphf//, G), p £ 
Pred[G] and c £ Cond[G], we define 

pop,= {cop\c£p} 
cop, = (a c op,p c ) . 

Clearly, po p £ Pred[fT] and c o p £ Cond[fF]. This construction satisfies: 

Proposition 2. Let p £ Pred[fF], p £ Graph(G, H) and 9 £ Graph (G,K); then 
9 \= p o p iff there is an p £ Graph (H, K) such that tj \ = p and 9 = p o p. 
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Proof: if. Assume 77 |= p. It follows that, for some c £ p, there is a A: T c —> K such 
that rj = A o a c and A \L p c . But then 9 = pop = Xoa c op = Xo a COM ; since 
c o p £ p o p, it follows that 9 \= po p. 

Only if. Assume 9 |= p o p. It follows that, for some c £ p, there is a A: T c — > AT such 
that 0 = A o a COA t = X o a c o p and A p co ^ (= p c )- But then A o a c |= p, and 
hence 77 = A o ct c fulfills the proof obligation. □ 

In the sequel we make heavy use of pushouts in Graph; therefore we introduce some 
auxiliary notation. Given a: H — > K and p: G — > H , we write a f p (“the remainder of a 
after p ”) for the morphism opposite a in the pushout diagram; hence a]p: G — > L and 
pi a: K — > L are such that (among other properties) (at p) 0 p = (pi a) o a. We extend 
this notation to predicates p £ Pred[G] and conditions c £ Cond[G] as follows; 

Pip = {clp I c£p} 
c1p= (a c 1p,p c 1(p1a c )) . 

It follows that pip £ Pred[iL] m\dc1p £ Cond[fL]. By taking pushouts, essentially we 
merge the additional structure specified by p with the existing conditions, in the “least 
obtrusive” way. These constructions clearly yield predicates, resp. conditions again. The 
following correspondence plays an important role in the sequel. 

Proposition3. Let p £ Pred[G], p £ Graph(G, H) and 9 £ Graph(fT, K); then 9 o 
P\=piff9 \=PlP- 

Proof sketch. The proof strategy is similar to that of Th. 1, by mutual induction on the 
depth of predicates and conditions, alternating between “case Pred” in the proposition, 
and “case Cond” reading “If c £ Cond[G], then 9 o p |= c iff 9 |= c | pX □ 

We now turn each Pred[G] and Cond[G] into a category. The arrows will essentially 
be proofs of implication. We define the hom-sets Pred[G] (p. q) for p. q £ Pred[G] and 
Cond[G](c, d) for c, d £ Cond[G] as the families of smallest sets such that: 

- f'-P^Q a function and r ) c £ Cond[G](c, /(c)) a condition arrow for all c £ p 
implies (/, (7 c ) cep ) £ Pred[G](p, q); 

- p: Td — > T c a function (in the reverse direction!) such that a c = p o ad and 7r £ 
Pred\Tc\(pd1 PtPc) a compatible predicate arrow implies (p,7t) £ Cond[G](c, d). 

We let 7T range over sets of the form Pred[G](p, q), and use f n and ^/ n , c for c £ p to 
denote its components. Similarly 7 ranges over sets of the form Cond[G](c, d), and p 7 , 
7 r 7 denote its components. The following confirms the intuition that arrows between 
predicates are proofs of logical implication. The proof again goes by mutual induction 
(on the depth of p) of cases for Pred and Cond. 

Proposition 4. Let 9 £ Graph(G, H) and p^q £ Pred[G]. If Pred[G](p, q) is non- 
empty then 9 \= p implies 9 \= q. 

In preparation of the translation from FOL to graph predicates, we define the following 
operations over c 7 d £ Cond[G] andp, q £ Pred[G] (for arbitrary G): 
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c X d= {{ad ad) o a dl pd{ad]a c ) W pd]{ada c )) 

P + q = pw q 

pxq={cxd\c£p,d£q} 



(4) 

(5) 

( 6 ) 
(7) 



\p= {{id G ,p)} . 



In passing we note some facts about Pred and Cond. 

Theorem 2. ForallG £ Graph, Cond[G] is a category with products defined by (4) and 
initial object {id G ,$); Pred[G] is a category with products defined by (6), coproducts 
defined by (5), initial object 0 and terminal object {( id G , 0)}. 

Proof sketch. Concatenation in Cond[G] and Pred[G], for tti = (/*, ( 7 i, c )cepi) £ 
Pred[G]{pi,p i+1 ) and 7 * = ( pi ,7Tj) £ Cond[G](cj, c i+1 ) ( i = 1,2), is defined by 



where forA £ Graph {G,H) and 7 t £ Pred[G](p, q), 7 £ Cond[G](c, d), the remainders 
7 t|A £ Pred [ 12 ] (p| A, qjX) and 7 ] A £ Pred [12] (c| A, djX) are defined by 



Note that, in order for tt| A to be well-defined, we need to assume distinct cj A. Since 
we are free to choose these objects up to isomorphism, this assumption causes no loss 
of generality. The projections pr c and pr d for the product in Cond[G] are given by 



The following proposition affirms that the operations defined above are appropriate for 
modeling FOL connectives. This partially follows from the characterization (in Th. 2) 
of + and x as coproduct and product in Pred, plus Prop. 4 stating that arrows in Pred 
induce logical implication. 

Proposition 5. Let 0 £ Graph(G, H) andp, q £ Pred[G]. 

1. 9 \= p + q if and only if 9 \= p or 9 |= q. 

2. 9 \= p X q if and only if 9 \= p and 9 \= q. 

3. 9 \= \p if and only if 9 

The final elements we need for the translation from FOL to graph predicates are 
representations for the base formulae, x = y and a (a :,y). These are given through 
graph morphisms a x - y and a a r X}V p given pictorially in Fig. 4. The following table 
defines a function yielding for every <j> £ FOL and finite X D fv{(f>) a graph predicate 



7T2 O TTl = (/ 2 O / 1; (72,/, (c) ° 7l,c)cepi)) 
72 °7l = {pi °fi2,(7T2l>l) 07r l) ■ 



= ({(cTA,/x(c)tA) I cep}, (7ctA)cTAGgTA) 

7tA = (mTA, ttTA) . 



pr c = {a d ]a c , id Pc ^ ad]<Xc) ) (£ Cond[G](c x d, c)) 
P r d = { a c]a d dd PdV ^ ac]ad) ) (£ Cond[G](c x d,d)) . 



□ 



IHx £ Graph[(X)]. 
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Fig. 4. Graph morphisms for the base formulae x = y, resp. a(x, y) 



1% = V\x = { ( y a x= y ,®)}] emb[{x,y} , X] 

I a ( x ,y)lx = {( a a( a: ,y),0)}Tem&[{a;,y},X] 

htix = m x 
N*\ x = 
ia^Ja- = 

l 3Y -4>ix = Mxu y ° emb[X , X U Y] . 

The following is the second half of the main correspondence result of this paper. 
Theorem 3. Let <j> £ FOL and 9 £ Graph ((X),G) with X D fv{4>); then 6 \= <f> iff 

d \= Mx- 

Proof. By induction on the structure of <j>. For the base formulae the result is immediate. 
For negation, disjunction and conjunction the result follows from Prop. 5, and for the 
existential from Prop. 2. □ 

It should be clear that there is a direct connection between the depth of graph predicates 
and the level of nesting of negation in the corresponding formula. We will make this 
connection precise. We use paths through the syntax tree of the formulae to isolate the 
relevant fragments of FOL. In the following theorem, a string of 3 and -i indicates the 
set of all formulae for which, if we follow their syntax trees from the root to the leafs 
and ignore all operators except 3 and all the resulting paths are prefixes of that string. 

Theorem 4. Let n £ Nat; then the set of predicates p with depth(p) < n is equivalent 
to the FOL-fragment 3(-i3) n . 

Note that we could have formulated the same result while omitting 3; we have included 
it to stress that V is only allowed in its dual form, ->3->. 

Proof. Immediate from the two conversion mappings, 0 P for p £ Pred and [0] for 
tj> £ FOL; see Th. 1 and Th. 3. □ 

This result has consequences in the theory of algebraic graph transformations. The key 
insight is that the application conditions of [10] are exactly graph predicates of depth 0 
(the positive conditions) and 1 (the negative conditions, or NACs) — where it should 
be noted that application conditions are closed formulae, so the corresponding graph 
predicates are in Pred[(0)]; and indeed the presentation in [10] omits the base graph. 

Corollary 1. The application conditions proposed in [10] are equivalent to the FOL- 
fragment 3^3. 
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For instance, a useful property that can yet not be expressed through NACs is the 
uniqueness of nodes: the property By: next(a;, y) A \/z: (next(a:, z) => z = y), expressed 
in Fig. 2 by a graph predicate of depth 2, is outside the fragment 3—3. More generally, 
as noted in the conclusion of [10], NACs cannot impose cardinality constraints, which 
in fact all have roughly the form BX: (/\ x yeX x ± y ) A V^: \l x&x z = x . and hence 
are in 3->3->. 3 



5 Graph Predicates as Graphs 

The way we defined them, graph predicates are highly structured objects. We now show 
that much of this structure can be done away with: there is an equivalent representation 
of graph conditions as sets of simple, flat graphs, in which the nesting structure is trans- 
ferred to the edge labels. Graph predicates thus correspond to sets of such graphs. In 
this section we define the conversions back and forth. All proofs, being rather technical 
and uninteresting, are omitted. In this section. Lab is as follows: 

Lab = Rel U (Rel = x Nat) . 

- Rel is the set of relation symbols, as before. Plain relation symbols (i.e., without 
depth indicators, see below) are called base labels', they correspond to the base 
graph G of a predicate p £ Pred[G]. 

- Rel= x Nat, where Rel = = Rel U {=}, consists of pairs of a relation symbol (this 
time including equality =, see below), together with a natural number indicating the 
depth of the edge. For Rel this will be the depth in p at which the edge is introduced; 
for = it will be the depth at which the nodes are introduced or equated. 

We use b to range over Rel = and *b as shorthand for (b, £); we use £ to range over Lab. 
Furthermore, we use J_ to denote the base depth and regard it as an element smaller 
than any i £ Nat, and such that * — 1 = X if z £ {X, 0}. We use S, e to range over 
Nat U {_L} and sometimes use ± a as equivalent notation for a (£ Rel). We define the 
depth of nodes and edges as 

- depth(v ) = i if (v, =, v ) £ E, and X otherwise; 

- depth(e) = i if e = (v, ®a, w). 

(Node depth is well-defined due to the first well-formedness constraint below.) We call 
a £ N U E base if depth(a) = X. In the remainder of this section we use RelGraph 
to denote the set of graphs over Rel used in the previous sections, and CondGraph to 
denote the set of condition graphs, which are graphs over Lab that satisfy the following 
well-formedness constraints, in addition to those already mentioned above: 

3 This is no longer true when matchings are required to be injective; see Sect. 6 for a conjecture 
about the increased expressiveness of that setting. Also, as remarked in the introduction, the 
double-pushout approach imposes further application conditions to ensure the existence of 
pushout complements, which we ignore here. 
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Fig. 5. Example condition graphs. 



1 . For any v,w £ N and b £ Rel = there is at most one <5 such that (w/b, w) £ E. 

2. If (fj^b, w) £ E then depth(y), depth(w) < 6. 

3. If ( v , =,w) £ E then ( w , =, v) £ E. 

Note that RelGraph C CondGraph. Fig. 5 contains some example condition graphs. 
We indicate node depths by inscribing the depth inside nodes, and edge depths by ap- 
pending the depth to the label. =-labeled edges, which are always bidirectional due to 
well-formedness condition 3, are drawn undirected. Graphs (i) and (ii) are flattenings of 
the morphisms a x — y and o a ( x ~ y; displayed in Fig. 4. Graph (iii) is the flattening of the 
right hand condition of Fig. 1 ; (iv) is the condition of Fig. 2 without the base level and 
(v) the complete condition. Graph (vi) represents the (right hand, connected) condition 
of Fig. 7 below. 

Any graph morphism a.G-^C can be flattened to a condition graph, flat (a), by 
enriching G with the structure provided by a while keeping it distinguishable from the 
structure already present in G, so that a can be fully reconstructed (up to isomorphism 
of C). There are essentially three kinds of additional structure: fresh nodes of C, fresh 
edges of C, and node mergings, i.e., nodes on which a is non-injective. Node merg- 
ings and fresh nodes will be indicated using ^-labeled edges, and fresh edges by a 
(non-base) depth indication. In general we allow G £ RelGraph and C £ CondGraph. 
W.l.o.g. assume Nq H Nq = 0; then flat(a) = (TV, E) £ CondGraph such that 

N=N G U(N C \ a(N G )) 

E = E g U {(w, °a, v) \ (u,a,v) £ E G , (a(u), a, a(v)) £ E c j 
U {(«, =, v) I u y^G V, a(u) = a(?;)} 

U {( u , I+1 b, v) | (a(v)fl b, a(u)) S E c } . 

Here a = a U id N \ Nc extends a with identity mappings for the fresh nodes, and 
=G is the identity relation over N G - Note that we do not only add depth indicators to 
the additional structural elements, but we also increment the depth indicators already 
present in C, so as to keep them distinct. For instance, note that, as expected, graphs (i) 
and (ii) in Fig. 5 indeed equal flat(a x - y ) and flat(a a ( x , y )) ( see Fig- 4). 

For the inverse construction, we need to resurrect the original source and target 
graphs from the flattened morphism; or more generally, we construct morphisms from 
conditional graphs. In principle, the source graph is the sub-graph with base depth, 
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whereas the target graph is the the entire condition graph, where, however, the depths 
are decremented and the nodes connected by =-edges are collected into single nodes. 

In the following, given a binary relation p over a set A, for any a £ A we define 
[a] p as the smallest set such that (i) a £ [a\ p , and (ii) if b £ [a] p and p(b : c) or p(c, b) 
then c £ [a] p . Likewise, A/ p = { [ a} p \ a £ A} is the partitioning of A according to p. 

Let G £ RelGraph, C £ CondGraph and p = (G, G, /). We define C|j_ as the 
base part of G, and G" as G considered “one level up”, i.e., with all depth indicators 
decremented and all =-related nodes collected (considering v = w iff (v, =, w) £ E). 
Using these, we construct cj>c'- G| j_ — > C~ mapping each v £ .ZVc|_l to [u]|L 

G|_l = (Nc\±, Ec\±) where A\± = {a £ A \ depth(a) = J_} (8) 

C~ = (N/=, {([w]=, 5_1 b, [u]l) | (w, d 'b,u) £ E c A (b £ Rel V <5 > 0)}) (9) 

n\± = (G,C\ ± ,f) (10) 

<Pc = (G|_l,G _ ,{(u, [u]a) j v £ N c \±}) . (11) 

Resurrecting a morphism is left inverse to flattening, but not for all condition graphs 
right inverse. The latter is due to the fact that we have not bothered to give an exact 
characterization of those condition graphs that may be constructed by flattening. Such 
a characterization would be cumbersome and not add much to the current paper: the 
main purpose here is to show that graph conditions may be flattened without loss of 
information. 

Proposition 6. Let G £ RelGraph and C £ CondGraph with a: G — > C. 

1. There exists an isomorphism p such that (f>fl a t(a) = ao p. 

2. There exists an epimorphism p:C — > 

We now extend these principles to graph conditions c, which are essentially nested 
morphisms. Here the depth indicators really come into play. The construction proceeds 
by flattening the sub-conditions in p c , taking the union of the resulting graph as an 
extended target graph for a c and then flattening (the extended) a c . 

flat(c) = flat(P) where [3 = emb[T c , UdG Pc fl a t(d)} o a c (12) 

where emb[G, H ] £ Graph(G, H) is given by (G, H , id]y G ) if Nq C Nh , Eq C Eh- 
For the inverse construction, we need to reconstruct the d £ p c from flat(c). For 
this purpose, we use the connectedness of flat(c). A fragment of a condition graph will 
be taken as part of the same sub-condition if it is connected at depth > 0. For instance, 
graph (v) in Fig. 5 has one connected sub-condition, whereas graph (vi) has two. 

The required notion of connectedness can be captured through the decomposition 
of morphisms into primes , as follows. For A £ Graph(G,fF) and p £ Graph (G,K) 
we define A + G p = (A| p) o p\ the superscript G stands for the source graph of the 
morphisms considered — more formally, this is the coproduct operation in a category 
of morphisms with source G (the slice category of Graph under G). Connectedness, in 
the above sense, is related to the ability to decompose a morphism into summands. We 
call p prime if it has no non-trivial decomposition under + G ; that is, if p = M for 
a set of morphisms M (where 0 = ida) implies that M contains some isomorphic 
representative of p. The following characterizes prime morphisms. 
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Fig. 6. Steps in the construction of a graph condition 



Proposition 7. p £ Graph(G, H) is prime iff one of the following holds: 

1. p is an epimorphism, and non-injective on exactly one pair of nodes; 

2. p is a monomorphism, and the set A of nodes and edges in H that do not occur as 

images of p is connected by <-> C A x A, defined as the least relation such that 
(■ v , £, w) v and (v, £,w) <-> w whenever (v, £,w),v,w £ A. 

The key property in the use of prime morphisms, stated in the following proposition, is 
that every morphism p:G — > H can be decomposed into a finite number of them. 

Proposition 8. For all p £ Graph(G, H), there is a finite set of prime morphisms P, 
such that p = a o ^ P for some isomorphism a. 

Note that the decomposition P is not unique, even up to isomorphism; however, if p is 
an isomorphism then P = 0 is the only possibility. For the developments in this section 
it does not matter which decomposition we take; rather, we assume that primes(p) is 
some (fixed) prime decomposition. Let G £ RelGraph,G £ CondGraph and p:G—>C. 



Thus, cond constructs a graph condition from a morphism by turning its target graph 
into a new morphism, and recursively calling itself on its prime decomposition. This 
terminates because in p £ primes (fin) with D = tgt(p), all depth indicators of tgt(rf) 
have been decreased w.r.t. D\ and if D is base then fin is an iso, hence primes(<pD) = 
0. For example, Fig. 6 shows several stages of constructing cond(C), with C the graph 
on the left hand side. <pc and fic-i are themselves prime, but primes { 4 >c 2 ) = {^31 AG}- 
From the figure we can see cond(C) = Glu|jl, { (/tt 2 1 _L , {(// 3 |_l, 0), {ih\x, 0)})})- 
This construction gives us half of the desired correspondence (compare Prop. 6.2): 

Proposition 9. All C £ CondGraph have an epimorphism p:C — > flat (cond (G)). 



cond(p) = (p\±,{cond(r]) \ p £ primes ((j> tgt ^))}) 
cond(C ) = cond(4>c) ■ 



(13) 

(14) 
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Fig. 7. An unconnected graph condition and its connected normal form 



Still, this does not yet solve the problem, since graph conditions do not generally 
have the connectedness required to reconstruct them from their flattenings. For instance, 
flat(ci) for ci as in Fig. 7 yields G of Fig. 6 , but cond(C) is not isomorphic to ci, and 
indeed the two are also inequivalent as properties over graphs. On the other hand, c -2 in 
Fig. 7 is equivalent to Ci in this sense; fiatic'i) yields graph (vi) of Fig. 5, from which 
cond does construct a condition isomorphic to c^- 

To formulate the required connectedness property, we enrich the target graph T c of 
conditions c with information about the connections made deeper in the tree of mor- 
phisms underlying c. For arbitrary p £ Pred[G] and c £ Cond[G] we define: 

Gff = (N g , Eg U {{v, <->, w) I 3c £ p : a c (v) is connected to a c (w) in (T c )~}) 
a? = (G,(T c )~J a J . 

This brings us to the following definition of connectedness. 

Definition 5.1. Graph condition c £ Cond[G] is called connected if for all d £ p c , off 
is prime and d is connected. 

The following proposition lists the important facts about connected graph conditions: 
the graph conditions constructed by cond (14) are always connected, connected graph 
conditions can be flattened without loss of information (compare Prop. 6.1), and for 
every graph condition there is an equivalence connected one. 

Proposition 10. Let c £ Cond[G]. 

1. If c = cond{C) for some C £ CondGraph, then c is connected. 

2. If c is connected, then there is an isomorphism 7 : c — > cond(flat(c)). 

3. There is a connected c £ Cond[G] such that 6 |= c if and only if 6 \= c. 

This brings us to the main result of this section, which states that every graph condition 
can be flattened to a condition graph expressing the same property. In order to represent 
a graph predicate p £ Pred[G], we use the set of condition graphs {flat(c) \ c £ p}. 

Theorem 5. Let c £ Cond[G]; then there is a C £ CondGraph such that 6 |= cond(C) 
if and only if 6 |= c. 
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6 Conclusions 

We have presented an equivalent representation of first-order logic, as a recursively 
nested set of graph morphisms. We have defined compositional translations back and 
forth, and given an expressiveness result relating the recursive depth of our graph pred- 
icates to the corresponding fragment of FOL. 

Subsequently, we have shown how the nested graph predicates can be translated, 
without loss of information, to flat graphs. We see as the main advantage of this that we 
can now use graph transformations to transform predicate graphs. This points the way 
to a potential connection between graph transformation and predicate transformation. 

Graph constraints and conditional application conditions. At several points during the 
paper we already mentioned the work on (negative) application conditions within the 
theory of algebraic graph transformation, originally by [6] and later worked out in more 
detail in [10]. This paper is the result of an attempt to generalize their work. 

Other related work in the context of graph transformations may be found in the con- 
ditional application conditions of [1 1] and in the graph constraints as proposed in, e.g., 
[12] and implemented in the AGG tool (cf. [9]). In fact it is not difficult to see, using the 
results of this paper, that conditional application conditions are expressively equivalent 
to the 3—3—3-fragment of FOL. and graph constraints to the -3—3-fragment. 

We conjecture that the requirement that all matches be injective increases the ex- 
pressive power precisely by allowing inequality (but no other forms of negation) to 
occur in the context of the inner 3. 

Existential graphs. A large body of related work within (mainly) artificial intelligence 
centers around Peirce’s existential graphs [14,15] and Sowa’s more elaborate concep- 
tual graphs [17]. The former were introduced as pragmatic philosophical models for 
reasoning over a century ago; the latter are primarily intended as models for knowledge 
representation. There are obvious similarities between those models and ours, espe- 
cially concerning the use of nesting to represent negation and existential quantification. 
On the other hand, the thrust of the developments in existential and conceptual graphs 
is quite different from ours, making a detailed comparison difficult. New in the current 
paper seems to be the use of edge labels to encode the nesting structure; throughout 
the work on existential and conceptual graphs we have only seen this represented using 
so-called cuts, which are essentially sub-graphs explicitly describing the hierarchical 
structure. For us, the advantage of our approach is that our models can be treated as 
simple graphs, and as such submitted to existing graph transformation approaches. 

More or less successful approaches to define a connection between existential, resp. 
conceptual, graphs and FOL can be found in [2,18,5]. In particular, in [18,5]) a complete 
theory of FOL is developed on the basis of the graph representations. 

Variations. We have chosen a particular category of graphs and a particular encoding 
of FOL that turn out to work well together. However, it is interesting also to consider 
variations in either choice. For instance, our graphs do not have parallel edges, and our 
encoding does not allow reasoning about edge labels. It is likely that similar results can 
be obtained in an extended setting by moving to graph logics as in [1,4]. 
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As an example of the use of such extensions, consider the so-called dangling edge 
condition that partially governs the applicability of a double-pushout rule (see also Foot- 
note 1). This condition (under certain circumstances) forbids the existence of any edges 
outside an explicitly given set. In the setting of this paper, there is no uniform way to 
express such a constraint, since it requires the ability to refer to edges explicitly while 
abstracting from their labels. 

Open issues. We list some questions raised by the work reported here. 

- A direct semantics for condition graphs. The flat graph representations of Sect. 5 are 
shown to be equivalent by a translation back and forth to the nested graph predicate 
structures. Currently we do not have a modeling relation directly over condition 
graphs. 

- The connection to predicate transformations. Traditional approaches to predicate 
transformation have to go to impressive lengths to represent pointer structures 
(e.g., [13]), whereas on the other hand graphs are especially suitable for this. Using 
the condition graph representation of predicates presented here, one can use graph 
transformations to construct or transform predicates. 

- The extension of existing theory on graph transformation systems to support full 
graph predicates instead of (conditional) application conditions; for instance, rule 
independency in the context of negative application conditions developed in [10], 
and the translation of postconditions to application conditions in [11,12,7]. 
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Abstract. In the field of Knowledge Discovery, graphs of concepts are 
an expressive and versatile modeling technique that provides ways to 
reason about information implicit in the data. Typically, nodes of these 
graphs represent unstructured closed patterns, such as sets of items, 
and edges represent the relationships of specificity among them. In this 
paper we want to consider the case where data keeps an order, and 
nodes of the concept graph represent complex structured patterns. We 
contribute by first characterizing a lattice of closed partial orders that 
precisely summarizes the original ordered data; and second, we show 
that this lattice can be obtained via coproduct transformations on a 
simpler graph of so-called stable sequences. In the practice, this graph 
transformation implies that algorithms for mining plain sequences can 
efficiently transform the discovered patterns into a lattice of closed partial 
orders, and so, avoiding the complexity of the mining operation for the 
partial orders directly from the data. 



1 Introduction 

Formal Concept Analysis, mainly developed by [6], is based on the mathematical 
theory of complete lattices (e.g. [5]). This area has been used in a large variety 
of fields in computer science, such as in Knowledge Discovery where graphs 
of concepts are an expressive modeling technique to show structural relations 
implicit in the given set of data ([9, 10, 13]). 

The two basic notions of Formal Concept Analysis are those of formal context 
and formal concept. In the main case of interest for Knowledge Discovery, a 
formal context consists of a binary relation R that can be regarded as a set of 
objects associated with a set of items (attributes holding in each object), that 
is, R C O x X. On the other hand, a formal concept is a pair of a closed set of 
objects and a closed set of items linked by a Galois connection. To characterize 
a formal concept we need to define appropiate closure operators on the universe 
of items and objects respectively. 

A closure operator r on a lattice, such as the one formed by the subsets of any 
fixed universe, satisfies the three basic closure axioms: monotonicity, extensivity 
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and idempotency. It follows from these properties that the intersection of closed 
sets, which are those sets that coincide with their closure, is also another closed 
set. One way of constructing closure operators is by composition of two derivation 
operators forming a Galois connection [6] . The standard Galois connection for a 
binary database R maps each family of objects to the set of the items that hold 
in all of them, and each set of items to the set of objects in which they hold. 
Then, the resulting closure operator r acts as follows: given R , the closure r(Z) 
of a set of items Z Cl includes all items that are present in all objects having 
all items in Z. The closed sets obtained are exactly the closed sets employed 
in closed set mining (see e.g. [9]), which in certain applications presents many 
advantages (see e.g. [3,9,10,13]). Similarly, we can consider the dual operator 
X -1 operating on the universe of the set of objects O and giving rise to closed 
set of objects, that is, X _1 (0) = O where O C O. For any binary database R, 
the closure systems of r and X -1 are isomorphic. 

The closed sets found in the database R can be graphically organized in a 
hierarchical order, called the concept lattice, i.e. a graph where each node is 
a pair (O, Z) formed by the closed itemset r(Z) = Z and the maximal set of 
objects O where Z is contained (symetrically, for X _1 (0) = O); so, each node is 
labelled by the pair of closed sets that is joined by the Galois connection. This 
concept lattice provides a comprehensive graphical representation that shows the 
structural relations between the concepts and summarizes, at the same time, all 
the characteristics of the binary data. 

In this paper we want to analyze the case where R is not a binary relation, 
but the items keep an order in each one of the objects where they hold. Here we 
study the construction of a lattice where formal concepts model more complex 
structures, such as partial orders. Our contribution is to prove that this clo- 
sure system of partial orders can be obtained by coproduct transformations on 
a simpler graph of so-called stable sequences studied recently ([4, 11, 12]). Algo- 
rithmically, this transformation avoids the computation of partial orders directly 
from the data. 



2 Preliminaries 

Let X = {ii, . . . , «„} be a finite set of items. Sequences are ordered lists of item- 
sets where we assume that no item occurs more than once in a sequence. The 
input data we are considering consists of a database of ordered transactions 
that we model as a set of sequences, V = {si, S 2 , ■ ■ ■ s n }. Our notation for the 
component itemsets of a given sequence will be s = ((/i)(/ 2 ) ... (/„)), where 
each R C X and /, occurs before itemset Ij if i < j. Note that each /, may 
contain several items that occur simultaneously; e.g. ((AC)(B)), where items A 
and C are given at the same time but always before item B. However, in or- 
der to simplify the definitions of the next theoretical discussion we consider for 
the moment that each /, contains only one single item. Later in the paper, we 
will show how the simultaneity condition can be incoroporated again into the 
theoretical framework. The set of all the possible sequences will be noted by S. 
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From the point of view of Formal Concept Analysis, we can consider each 
input sequence as an object. Formally, data T> can be transformed into an ordered 
context (see [4]) where objects of the context are sequences, attributes of the 
context are items, and the database becomes a ternary relation, subset of O x 
X x IN, in which each tuple (o, i, t) indicates that item i appears in the t-th 
element of the object o representing an input sequence s. A simple example of 
the described data and the associated context can be found in figure 1, where 
each object Oi of the formal context represents the corresponding input sequence, 
Si £ V. The context for a set of data V is only relevant to this paper to see objects 
Oi £ O and input sequences £ T> as equivalent, which eases the definition of 
the subsequent closure operator on ordered data. 



Seq id 


Sequence 




A 


B 


C 


D 


Sl 


{(A)(B)(C)(D)) 


Ol 


T 


“2 


¥ 


T 


S2 


(( B)(C)(D)(A )) 


02 


4 


1 


2 


3 


S3 


((B)(C)(A)(D)) 


03 


3 


1 


2 


4 



(a) Collection of data V (b) Context 



for V 

Fig. 1 . Example of ordered data T> and its context. 



Sequence s = {(If) . . . (/„)) is a subsequence of another s' = ((/() . . . (I' m )) if 
there exist integers ji < j 2 • • • < j n such that I\ C 7j i C /' _ . We note it 

by s C s'. For example, the sequence ((A)(D)) is a subsequence of the first and 
third input sequences in figure 1. 

The intersection of a set of sequences Si, . . . , s n £ S is the set of maximal 
subsequences contained in all the ,s,. Note that the intersection of a set of se- 
quences, or even the intersection of two sequences, is not necessarily a single 
sequence. For example, the intersection of the two sequences s = {(A)(C)(B)) 
and s' = ((A)(B)(C)) is the set of sequences {((A)(C)), ((A)(i?))}: both are 
contained in s and s', and among those having this property they are maximal; 
all other common subsequences are not maximal since they can be extended to 
one of these. The maximality condition discards redundant information since the 
presence of, e.g., ((A)(B)) in the intersection already informs of the presence of 
each of the itemsets (A) and (B). 



2.1 Stable Sequences and Closure Operators 

A sequence s is stable in input data T> if s is maximal in the set of objects where 
it is contained, that is, it cannot be extended. More formally, we say that: 

Definition 1. A sequence s £ S is stable if there exists no sequence s' with 
s C s' s.t. they are both subsequences of the same set of objects (equivalently, 
input sequences). 
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For instance, taking data from figure 1, sequence ((B) (D)) is not stable 
since it can be extended to ((B)(0)(D)) in all the objects where it belongs. 
However, ((B)(0)(D)) or ((A)(D)) are stable sequences in V. The set of all stable 
sequences of data from figure 1 are shown in figure 2. The most relevant existing 
contributions on the mining of stable sequences are given by two algorithms, 
CloSpan [11] and BIDE [12], which find in a resonably efficient way all the 
stable sequences for the input dataset V. Stable sequences are called “closed” 
there; we prefer a different term to avoid confusion with the closure operator. 
Note however, that these mentioned algorithms do not impose our condition of 
avoiding repetition of items in the input sequences. 



Objects 


Stable Sequences 


1,2,3 


((B)(0)(D)) 


1,2,3 


((A)) 


2,3 


((B)(0)(A)) 


1,3 


<(A)(D)> 


1 


((A) (B)(0)(D)) 


2 


((B)(0)(D) (A)) 


3 


((B) (C) (A) (D)) 



Fig. 2. Set of stable sequences and objects where they belong. 

As it is formalized in [4], the set of stable sequences can be characterized 
in terms of a closure operator, named A, operating on the universe of sets of 
sequences. Briefly, the defined Galois connection maps each family of objects 
to the set of the maximal sequences that hold in all of them, and each set 
of sequences to the set of all objects in which they hold. It is proved there 
that these mappings indeed enjoy the properties of a Galois connection so that 
their composition provides the necessary closure operator. Again, a closed set 
of sequences are those coinciding with their closure, that is, Z\({si, . . . s n }) = 
{si, . . . s„} where {si, . . . s„} C S\ similarly, a closed set of objects in this ordered 
context is defined by the dual closure operator, i.e. A~ 1 (0) = O. A main result 
in [4] is that stable sequences are exactly those that belong to a closed set of 
sequences. 

As in any other Galois connections (see [6]), this characterization gives im- 
mediately a lattice of formal concepts, that is, a graph where each node is a 
pair (O, {si, . . . s„}) where {si,...s„} are stable sequences belonging to the 
closed set of objects O , and viceversa. For instance, in data of figure 1 we have 
that the set of sequences {((B)(0)(D)), ((A)(D))} is stable for the first and 
third objects; reciprocally, the set of objects formed by first and third trans- 
action is closed for the set of stable sequences {((B)(0)(D)), ((A)(D))}. So, 
({ 01503 }! {((B)(0)(D)), ((A)(Z?))}) will be a formal concept of the lattice. It is 
proved also in [4] that all the stable sequences mined by CloSpan or BIDE can 
be organized in different formal concepts of the same lattice, and this graph 
of stable sequences characterizes the non-redundant sequential patterns of the 
ordered data. 
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Interestingly enough for our subsequent contribution, the work in [4] also 
proves that if we rather deal with input sequences more than with objects, then 
the set of stable sequences of a formal concept is exactly the intersection of those 
input sequences renamed after the objects of the same concept. That is, renaming 
each object by its input sequence in 2?, then a formal concept ( O , {si, . . . s n }) 
becomes (S, {si, . . . s„}) where S are the input sequences of O, and we always 
have the following property: {si, . . . s n } = fj s. 

sG5 



3 Partial Orders on Sequences 

Formally, we will model partial orders as a full subcategory of the set of directed 
graphs; so, as a starting point, we recall the most basic among the numerous 
definitions of graphs given in the literature. 

Definition 2. A directed graph is modeled as a triple G = ( V,E,l ) where V is 
the set of vertices; E C V x V is the set of edges; and l is the injective labelling 
function mapping each vertex to an item , i.e. I : V — > X. 

The set X in the labelling function is exactly the finite set of items defined 
in the preliminaries; this will be a fixed set in all the graphs belonging to our 
category. For our present work we consider that the labelling function l of a 
graph is injective (and not necessarily surjective). An edge between two vertices 
u and v will be denoted by (u, v) £ E, implying a direction on the edge from 
vertice u to v. 

Definition 3. A graph morphism h : G — > G' between two graphs G = (V. E, l) 
and G' = (V 7 , E ' , V) consists of an injective function hy ■ V — > V' that preserves 
labels (that is, V o hy = l ), and (it, v) £ E =>• ( h(u ), h(v)) £ E' . 

Note here that the injectivity of the morphism h between any two graphs, 
whose labelling function must be also injective, forces the morphism h to be 
unique. So, if there are h : G — > G' and g : G' — » G, it implies h = g and G = G' . 
The composition of h \ G — * G' with a morphism g : G' — > G" is the morphism 
goh : G — > G" consisting of the composed function gyohy. It is well known that 
the good properties of graph morphisms turn the set of graphs into a category. 
From the category of the set of graphs, we will be specially interested in the 
following constructor operator. 

Definition 4 (Coproduct). The coproduct of a family of graphs {GJ is a 
graph G = \\Gi in the same category and a set of morphisms {hi : Gj — > G} 
such that, for every graph G' and every family of morphisms {h[ : Gi — > G'}, 
there is an unique morphism g : G — > G' such that g ohi = h'i. 

In category-theoretic terms, the result G of a coproduct is the initial object 
among all those candidates G' . Moreover, we know that this G is unique in any 
coproduct since the morphisms {hi}, {/i'} and g are injective and the family 
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Fig. 3. Coproduct diagram. 



of graphs considered here have an injective labelling function. Therefore, the 
coproduct of two graphs G\ and Gi in our category defines exactly a union of 
G\ and G 2 where vertices in G\ and G 2 with the same label are joined, and 
where the injectivity of morphisms ensures that all edges from both graphs are 
preserved. 

From the set of all directed graphs, we will be interested in the full subcate- 
gory that models partial orders. A partial order (also called poset) is an acyclic 
directed graph G p = (V. E, l ) such that the relation on V stablished by edges 
in E is reflexive, antisymmetric and transitive. The sources of a poset are those 
vertices that do not have any predecessor; similarly, the sinks are those vertices 
not preceded by any other vertex in the poset. Note that a poset may have 
different unconnected components, and so, it may have several source nodes. 





Fig. 4. Example of a partial order and its transitive reduction. 



The graphical representation of partial orders is particularly useful for dis- 
playing results: we will display a poset by using arrows between the connected 
labelled vertices, and the symbol || (parallel) to indicate trivial order among the 
different components of a poset. The transitive reduction of G p = (V, E,l) is 
the smallest relation resulting from deleting those edges in E that come from 
transitivity. Posets will be graphically depicted here by means of its transitive 
reduction to make them more understandable (as in figure 4(b)), but of course, 
all edges of the transitive closure are indeed included in E (figure 4(a)). 

Some specific definitions we need on posets are the following ones. 
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Definition 5. We define that a poset G p is more general than another poset G p > , 
noted by G p < G p ' , if there exists a morphism from G p to G p ' , i. e. h : G p G p > . 

Then, we also say that G p ' is more specific than G p . 

For instance, the partial order represented in figure 4 is more specific than 
the trivial order || A,B,C,D || (parallelization of all items), but more general 
than (the transitive closure of) the total order B — > C — > D — > A. 

Definition 6. A partial order G p = (V, E, l) is compatible with a sequence s 
if: Vu & V we have that item l(u) is in s; and, \/(u,v) € E we have that 

(mmv)))Qs. 

In other words, a poset G p is compatible with a sequence s if there exists a 
morphism from G p to the poset obtained as the transitive closure of the sequence 
s. We will see a sequence as a partial order containing all the orders added by 
transitive closure, so we can express it for our convenience as G p < s. For 
instance, the partial order of figure 4 is compatible with the second and third 
input sequences of the data in example 1. The trivial order is compatible with 
any sequence having all the items of the poset; so, at least there will be one 
poset (the trivial one) compatible with a given set of sequences. In case that 
sequences in S do not have any item in common, then we assume the existence 
of an empty poset compatible with them. 

Definition 7. We define a path from a poset G p = (V, E, l ) as a sequence of 
items ((f i ) (f 2 ) ■ ■ • (in)) such that for all consecutive ij and ij+ \ in the sequence, 
there exists (u,v) G E s.t. l(u) = ij and l(v) = ij+i- 

For instance, sequences (( B)(C)(D )), or ((B) (A)), or ((_B)(C)(A)) define 
paths of the poset shown in figure 4(a). We define a path to be maximal with 
respect to the inclusion of sequences. E.g., path ((B) (A)) is not maximal since it 
is a subsequence of the path ((B)(0)(A)). Note that posets are acyclic, so, the 
maximal paths in a poset G p will always be defined between sources and sinks 
of G p (of course, avoiding the shortcuts of the transitive closure). Note also that 
since a poset is actually a graph, we are still able to operate coproducts on them; 
although this does not necesarily imply that the coproduct of two partial orders 
is another partial order. 

Next, we will consider the goal of summarizing the input sequences as the 
most specific partial orders compatible with them. 

3.1 A Closure System of Partial Orders 

This section presents our first contribution: the construction and visual display 
of a concept lattice where nodes contain partial orders, and the relationships 
among them will be representative of the relationships in the input ordered 
data. We will construct this lattice by avoiding the formalization of a closure 
operator; however, we will show that the constructed family of formal concepts 
is indeed a closure system. 




Coproduct Transformations on Lattices of Closed Partial Orders 



343 



We say that partial order G p is closed for a set of sequences S if G p is 
the most specific poset from all those posets compatible with all s £ S. For 
instance, given the set of sequences S = {((B)(C)(A)),((B)(C)(D))} we have 
two maximal posets compatible with them: the trivial order G Pl =|| B,C || and 
the total order G P2 = B — *■ C; but only G P2 is closed for S since G Pl < G P2 . 
Next proposition 1 ensures unicity of closed posets for a set of sequences S. 

Proposition 1. Given a set of sequences S there is exactly one single closed 
poset for S. 

Proof. As mentioned after definition 6, there is at least one compatible poset 
with all the sequences in S, which is the trivial order with the shared items of all 
sequences in S. Now, assume to the contrary that we had two most specific posets 
compatible with all s £ S, named G Pl and G P2 with corresponding morphisms 
into s. If so, we can construct a third poset G from the union of G Pl and G P2 , 
i.e. their coproduct G = ]j G Pi for i = 1,2 (as we mentioned, the union of two 
graphs in this category can be formalized as their coproduct). The properties 
of the coproduct ensure that G is unique, compatible with all s £ S, and more 
specific than G P1 and G P2 . □ 






Fig. 5. Example of a coproduct. 



We must point out an important remark after this proof: the coproduct is 
an operator defined on graphs, so, the coproduct of two partial orders may 
give a graph that is not another partial order. For instance, the coproduct of 
G Pl = A — > B and G Pl = B — > A leads to a graph with a cycle (then, not 
antisymmetric). However, in the proof of proposition 1 we are operating the co- 
product on two posets simultaneously compatible with the same set of sequences 
S. Given that the sequences in S do not have repeated items by definition, it is 
not possible to get a cycle from the coproduct of two posets compatible with S\ 
in other words, two partial orders whose union leads to a cycle cannot be both 
compatible with the same set S. 

We say that set of sequences S is closed for a partial order G p if S contains 
all the input sequences Si £ T> with which G p is compatible. For example, the set 
of input sequences from figure 1 that are closed with respect to poset in figure 4 
is S = {s 2 , s 3 } = {<(2?)(C)(£>)(A)>, <(B)(C)(A)(D)>}. 
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Now we are ready to define the notion of formal concept. 

Definition 8. A formal concept is a pair ( S , G p ) where G p is a closed partial 
order for the set of sequences S, and the set of sequences S is closed for the 
partial order G p . 

Formal concepts (S, G p ) will be nodes of the concept lattice of partial orders. 
In practice, we will visualize these nodes principally by the closed poset G p of 
the concept, and the dual S will be added as a list of object identifiers (thus, as 
it happens in general in Galois connections, these lists form a dual view of the 
same lattice that, in our case, is ordered by set-theoretic inclusion downwards); 
proposition 1 ensures that each node of the lattice can be represented by one 
single closed partial order. Edges in the lattice will be the specificity relationships 
among the concepts, named <, such that (Si,G Pl ) < ( <S , 2 ,G P2 ) if G Pl < G P2 . 
The set of all concepts ordered by < is called the concept lattice of partial orders 
of our context. Eventually, an artificial top representing a poset not belonging 
to any sequence is also added to the lattice and we note it by the unsatisfiable 
boolean constant □. In figure 6 we show the lattice of closed concepts for the 
data of figure 1. 




Fig. 6. Concept Lattice of partial orders for data in figure 1. 



Although the lattice of closed partial orders has been characterized with- 
out defining any specific closure operator that ensures fl-stability on concepts, 
we can indeed prove that our system is closed under intersection. Semantically 
speaking, intersection must preserve the maximal common substructure from 
the intersected objects. Again, because of the injectivity of the labelling func- 
tion along with the uniqueness of morphisms, the intersection of two posets can 
be easily formalized by the product operator of category theory (dual to the 
coproduct). However, we introduce now a completely different characterization 
of intersection of posets in terms of the coproduct of maximal paths. 
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Definition 9. The intersection of two partial orders G p nG p > is G p nG p > = ]J / ,; 
where {U} is the family of maximal paths contained in both G p and Gp> . 

Next lemma proves that the intersection as it is defined here preserves indeed 
the maximal common substructure from the intersected partial orders. 

Lemma 1. The result of G Pl D G P2 is the most specific poset G p from those 
where G p < G Pl and G p < G P2 . 

Proof sketch. If G p = ]Jtj where {f i} is the family of maximal paths contained 
in both G P1 and G P2 , then obviously there exists by construction a morphism 
G P < G Pl and G p < G P2 . Any other poset Gy having as well morphisms to 
G Pl and G P2 , must have a morphism to G p , i.e. G p > < G p , otherwise we would 
contradict the fact that some path f, used in the coproduct is not a maximal 
path. Thus, we exactly get a product diagram; so, G p defines the result of a 
product operation on partial orders from category theory. □ 

Note that since intersection of posets is then associative. Next proposition 2 
finally shows that our lattice is a system closed under intersection. 

Proposition 2. The intersection of closed posets is another closed poset. 

Proof sketch. We will reject the following false hypothesis: let (Gy, Si) and 
( G P2 ,S 2 ) be two different closed partial orders for S i and S 2 respectively, and 
we suppose that G Pl n G P2 = G p is not another closed poset, i.e. G p is not a 
part of any concept. In this case, let G p ' be the closed poset immediately over 
G p \ that is, G p < G p ' and it does not exist another closed G p " s.t. G p " < G p ' 
and G p < G p ». Then, it must be true that G p > < G Pl and G p > < G P2 (because 
both G Pl and G P2 are closed and different from G p ). But then, by lemma 1 we 
get a contradiction since G p is not the most specific poset s.t. G p < G Pl and 
G P <G P2 . □ 

This section shows that the input sequential data can be summarized by 
means of a concept lattice that presents a balance between generality and speci- 
ficity for all the input sequences. As an example, observe lattice of figure 6 that 
gets an overview of the ordering relationships in the data. We also see that par- 
tial orders such as || A,B,C,D || or B — » D or j| A, B — » D j do not create a 
node in the concept lattice, i.e. they are not closed: this is because these posets 
turn to be redundant in describing our data, they are compatible with all the 
input sequences, but they are not specific enough to be closed. 

The problem is now how to construct this useful concept lattice of closed 
partial orders. Currently, it is still a challenge how to deal with poset structures 
in the field of Knowledge Discovery: mining the partial orders directly from the 
data is a complex task incurring a substantial runtime overhead. For example, 
the work in [7] presents a method based on viewing a partial order as a generative 
model for a set of sequences and it applies different mixture model techniques. 
The final posets are not necessarily closed and so, they could be redundant; be- 
sides, they restrict the attention to a subset of partial orders called series-parallel 
partial orders (such as series-parallel digraphs) to avoid computational problems. 
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Note that here we do not restrict in any sense the form of the final closed posets. 
Another work worth mentioning here is [8] , where the authors present the mining 
of episodes (whose hybrid structures are indeed partial orders). However, again 
these structures are not closed in any lattice and computational problems still 
persist. 



4 Coproduct Transformations to Closed Posets 

In this section we show that our lattice of closed posets can be indeed obtained 
via coproduct transformations on the simpler lattice of stable sequences; thus, 
providing an efficient way to derive those closed posets. We present, next, one 
of the main results that will lead to our characterization. 

Theorem 1. Let G p be the most specific partial order compatible with a set 
of sequences S; then the set of maximal paths of poset G p defines exactly the 
intersection of all sequences in S, that is f) s. 

Vses 

Proof. We will prove both directions: maximal paths from G p come from the 
intersection of all s £ S; and that intersections of all s £ S define the maximal 
paths of G p . 

=>) Let t = ((i) . . . (j)) define a maximal path between a source node, labelled 
with item i, and a sink node, labelled with item j, of the poset G p , and suppose 
that t does not belong to the intersection of sequences in S. This implies that 
either 1/ t does not belong to some s £ S\ or 2/ t belongs to all s € S but t is 
not maximal (i.e, there exists another t' belonging to the intersection of S s.t. 
t. C t'). In the first case, it would mean that we started from a poset G p not 
compatible with the set of sequences 5; in the second case, it would mean that 
we started from a poset G p which is not the most specific, since with G p U t' 
(formalized as the coproduct) we would get a more specific poset. In any case 
we are contradicting the original formulation of G p in the theorem, so it is true 
that all maximal paths in G p come from the intersection of sequences in S. 

4=) Let f be a sequence belonging to the intersection of all s G S, such that it 
does not define a maximal path between a source and a sink in G p . This implies 
that G p is not the most specific for S, since we could add the path defined by t 
to the poset G p and get a more specific poset still compatible with all S. Again, 
we reach a contradiction that makes the original statement true. Again, we must 
insist on the fact that sequences in S do not have repeated items by definition, 
so the sequence t considered here always leads to a path for G p with no cycles. 

□ 

To illustrate this theorem, let us consider any formal concept ( S , G p ) of our 
lattice of partial orders: the maximal paths of the closed poset G p are defined 
exactly by the intersection of all sequences in the closed set of sequences S. 
For instance, taking the lattice in figure 6: the intersection of the closed set 

5 = {s 2 ,S 3 } = {((J?)(C)(D)(A)), ((H)(C)(A)(I?))}, is the set of sequences 
{((B)(C)(D)), ((B)(C)(A)}}, which coincides exactly with the maximal paths 
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of the closed poset G p in the same node. The next corollary follows immediately 
from the main theorem. 

Corollary 1. Let G p be the most specific partial order compatible with a set of 
sequences S; then the set of all paths from G p is defined exactly by subsequences 
of some sequence in the intersection of S, that is t' C p| s. 

Vses 

Proof. It immediately follows from theorem 1: if the maximal paths from G p are 
defined by the intersection of all s € S, then any subsequence of a maximal path 
defines a shorter path in G p . □ 

Clearly after corollary 1, it is possible to reconstruct the transitive closure of a 
closed poset G p with the intersections of the closed set of sequences S where G p is 
maximally contained. The coproduct transformation will help in this procedure, 
as it is shown in next theorem 2. Again, for our notational purposes we consider 
that a sequence is indeed a poset with all the proper edges added by transitive 
closure. 

Theorem 2. Let G p be the most specific partial order compatible with a set of 

sequences S, then G p is the result of the coproduct transformation on f] s. 

Vses 

Proof. We know by theorem 1 that the intersection of sequences in S are exactly 
those maximal paths of the most specific partial order G p compatible with S. 
Then, it is possible to prove that a poset G p comes from the coproduct trans- 
formations of its maximal paths. This can be easily proved by induction on the 
number of paths of the poset G p , and taking into account that the coproduct 
operator is associative. Moreover, because these paths come from intersections 
of S and S is restricted to not having repetition of items, then the coproduct 
transformations do not lead to any cycle here. □ 

This last theorem concludes that the closed poset G p from formal con- 
cept, ( S,G P ), can be generated by the coproduct transformations on the in- 
tersections of S. Besides, for the same reason given after proposition 1, we 
can be sure that coproduct transformations always return a valid poset in the- 
orem 2: two posets G Pl and G P2 whose coproduct transformation has a cy- 
cle cannot be both compatible with the same set of sequences S at the same 
time. To exemplify theorem 2, let us take the closed set of sequences S = 
{^ 2 , 53 } = {{(B)(C)(D)(A)) , ((B)(C)(A)(D))} , whose intersection gives the set 
of sequences {((B)(C)(D)), ((B)(C)(A)}}. The coproduct transformation on 
these intersections is given in figure 5, and it exactly returns the closed poset in 
that formal concept. 

Next section will explain the relation of these intersections of S with the 
stable sequences; thus, reaching a final lattice transformation of stable sequences 
to closed posets by means of coproduct operations on its nodes. 
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4.1 Lattice Transformation and Algorithmic Consequences 

As introduced in the preliminaries, the work in [4] shows that stable sequences 
can be characterized by the closure operator A. Therefore, the set of all stable se- 
quences from a database D can be organized in formal concepts (O, {si, . . . , s„}), 
where {si, . . . , s n } are a set of stable sequences for the closed set of objects O. 
Since each object in the ordered context represents indeed an input sequence 
from P, we can rewrite the formal concepts as (5, {si, . . . , s n }) where S is the 
set of input sequences renamed after O. From this point of view it is possi- 
ble to prove as in [4] that stable sequences are the intersection of S, that is, 

{si,...,s n }= fjs. 

sGS 

Actually, this last observation leads naturally to the following theorem. 




Fig. 7. Lattice transformation to closed partial orders. 



Theorem 3. A lattice of stable sequences can be transformed into a lattice of 
closed partial orders by rewriting each node via coproduct transformations. 

Proof. Let (O, {si, . . . , s„}) be a formal concept of stable sequences, that we 
can rewrite as (S, {si, . . . , s n }). We have that {si, . . . , s„} are stable sequences 
(maximal) in S , and S is the set of all the input sequences in which they appear. 
Let us construct a partial order G p via coproduct transformations on these the 
stable sequences {si, . . . , s n }. We have that G p is the most specific partial order 
for S: because {si,...,s„} is equivalent to fl{ s l s G S} (as shown in [4]), and 
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then, it immediately follows from theorem 1 and theorem 2 that G p is the most 
specific poset for S. Thus, G p is a closed poset for S. At the same time, S must 
be the maximal set of input sequences for the poset G v to not contradict the 
fact of being O a maximal set of objects for the stable sequences {si, . . . , s n }; 
so, S' is a closed set of input sequences for G p . Therefore, (S, G p ) can be derived 
from from (S, {si, . . . , s n }) via coproduct transformations. □ 

Figure 7 shows the set of all stable sequences from data in figure 1 organized 
in formal concepts. Each node of this lattice represents a set of stable sequences 
together with the list of object identifiers (input sequences) where they are max- 
imally contained. We partially order sets of stable sequences in the lattice as in 
[4]: {si, . . . , s„} A {«', . . . , s' m } if and only if Vs*3s'., s* C s'-. It can be graphi- 
cally seen that each node of the lattice of stable sequences can be transformed 
into a node of the lattice of partial orders by means of a coproduct. Dashed lines 
of figure 7 indicate a node rewriting by means of a coproduct transformation on 
the stable sequences. 

In practice, this transformation carries several consequences. Mining com- 
plex structured patterns directly from the data, such as partial orders, graphs, 
molecules, etc, is a complex task due to the overhead incurred by the algorithms 
(see [7, 8]). However, the presented lattice transformation allows the construction 
of closed posets in the data by transforming the graph of stable sequences. So, 
the algorithmic contributions to the mining of stable sequences, such as CloSpan 
([11]) and still more efficient, BIDE ([12]), can be used to mine partial orders by 
performing coproduct transformations on the discovered stable sequences. 



4.2 Simultaneity Condition Revisited 

To consider input sequences s = ((Ji)(/ 2 ) . . . (/„)) where each may contain 
several simultaneous items, we must redefine the kind of posets we deal with. In 
particular, the labelling function l must be a function mapping each vertex to a 
set of items, i.e. I : V — > 2 X . Again, this function is required to avoid repetitions 
of items in the nodes of the partial order (since we are not allowing repetitions 
in the input sequences either); so, the injectivity condition of l implies here that 
the items in each node are unique in the poset. With the new labelling function, 
the path from a poset is a sequence of itemsets and not single items. All the 
presented results hold with the new definitions. 

5 Conclusions and Further Work 

This paper shows that a simple lattice of stable sequences can be transformed 
into a lattice of closed partial orders: the first step proves that the maximal paths 
of each closed poset G p are characterized by the intersections of those input 
sequences where G p is contained. Then, the convenient coproduct operator is 
used to naturally allow the formalization of G p as the transformation on those 
intersections. The last step is to prove that the intersections of input sequences 
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are indeed stable sequences, so that it is possible to rewrite each node of the 
lattice of stable sequences into a closed partial order. 

This transformation performed on lattices is of great importance in the field 
of Knowledge Discovery, where the mining of partial orders directly from the 
data is a complex task. In particular, it implies that algorithms for mining plain 
stable sequences can efficiently transform the discovered patterns into a lattice 
of closed partial orders that best summarizes the data. 

The next step to complete our transformation framework would be to con- 
sider repetition of items in the input sequences, and so, to allow a non-injective 
labelling function in graphs. In this case, the coproduct operator does not work 
because the result of the transformation might not be unique. However, the 
pushout operator seems to fit in the new conditions: the pushout constructor 
can be used to formalize exactly the union and intersection of partial orders, 
and to allow the transformation process on the intersections of sequences. The 
problem with the pushout operator is to define a convenient target graph so that 
the pushout diagram commutes; it is still unclear at the moment how to achieve 
this. We also plan to study as a future work other complex structures, such as 
concept lattices of graphs or molecules: there we hope to find out which sub- 
stitution mechanisms are required to obtain closed structures for more complex 
data. 
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Abstract. A string generating hypergraph grammar is a liyperedge 
replacement grammar where the resulting language consists of string 
graphs i.e. hypergraphs modeling strings. With the help of these gram- 
mars, string languages like a n b n c n can be modeled that can not be gener- 
ated by context-free grammars for strings. They are well suited to model 
discontinuous constituents in natural languages, i.e. constituents that 
are interrupted by other constituents. For parsing context-free Chom- 
sky grammars, the Earley parser is well known. In this paper, an Earley 
parser for string generating hypergraph grammars is presented, leading 
to a parser for natural languages that is able to handle discontinuities. 



1 Discontinuous Constituents in German 

One (of many) problems when parsing German are discontinuous consti- 
tuents [1]. Discontinuous constituents are constituents which are separated by 
one or more other constituents and still belong together on a semantic or syn- 
tactic level. An example 1 for a discontinuous constituent is 

(1) Er hat schnell gearbeitet. 

He has fast worked. 

He (has) worked fast. 

The verb phrase hat gearbeitet ( (has) worked) 2 is distributed; the finite verb part, 
the auxiliary verb hat (has ) , is always in the second position in a German declar- 
ative sentence. The infinite verb part, the past participle gearbeitet (worked), is 
usually in the last position of a declarative sentence, only a few exceptions like 
relative clauses or appositions can be put after the infinite verb part. Another 
(more complicated) German example of discontinuous constituents is 



1 The German examples are first translated word by word into English to explain the 
German sentence structure and then reordered into a correct English sentence. 

2 The present perfect in German can be translated either in present perfect or in past 
tense in English. 



H. Ehrig et al. (Eds.): ICGT 2004, LNCS 3256, pp. 352-367, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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Aux Adv Part2 



’ t t t 

Er hat schnell gearbeitet 

Fig. 1. The phrase structure tree for the German sentence Er hat schnell gearbeitet. 
(He (has) worked fast). 



(2) Kleine griine Autos habe ich keine gesehen die mir gef alien hatten. 

Small green cars have I none seen that me pleased would have. 

I did not see any small green cars that would have pleased me. 

Here, the noun phrase keine kleinen griinen Autos, die mir gefallen hatten ( any 
small green cars that would have pleased me) is distributed in three parts 3 . There 
are also discontinuous conjunctions as in 

(3) Weder Max noch Lisa haben die Aufgabe verstanden. 

Neither Max nor Lisa have the task understood. 

Neither Max nor Lisa understood the task. 

In this example, the discontinuity of weder . . . noch is also present in the English 
translation neither . . . nor. 

It is typical for all the above examples that the syntactical or semantical 
connection between the two parts of the discontinuous constituent cannot be 
expressed with general context-free Chomsky grammars. One would wish for 
a phrase structure tree as shown in Figure 1 for example (1). Of course it is 
possible to construct a weakly equivalent context-free Chomsky grammar to 
parse such a sentence, but it must contain some work-around for the discon- 
tinuous constituent like attaching one part of the discontinuous constituent in 
another production than the other. The main advantage of context-free string 
generating hypergraph grammars lies in their possibility to describe discontinu- 
ous constituents in a context-free formalism. A more detailed description can be 
found in [3] . It is desireable to have a parser based on this representation formal- 
ism for discontinuous constituents. With the help of a parser, larger grammar 
descriptions can be developed and tested. An often used parser for context-free 
Chomsky grammars is the Earley parser [4]. The main goal of this paper is to 
describe an Earley parser for context-free string generating hypergraph gram- 
mars. Therefore, first a short introduction into this type of grammars is given 
and their application in natural language modeling is shown. Related work is 
shortly described. Finally the modified Earley parser is described in detail. 



This syntactic phenomenon is also called split topicalization [2]. 



3 
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Fig. 2. The hypergrapli representing a 2 b 2 c 2 . 



2 String Generating Hypergraph Grammars 

Hypergraph grammars have been studied extensively in the last decades. In- 
troductions and applications can be found in [5], [6]. A subset of hypergraph 
grammars, context-free string generating lrypergraplr grammars, are described 
in detail in [7], [8], [5]. In this section, only an overview of the most important 
definitions is given. 

A labeled directed hyperedge is a labeled edge that is not only connected 
to one source and one target node as an ordinary graph edge, but to 

— a sequence of arbitrary length of source nodes (si, S 2 , . . . , s n ) and 

— a sequence of arbitrary length of target nodes (ti, f 2 , . . . , t m ) with n, m > 0. 

The points where the hyperedge is connected to lrypergraplr nodes are called 
tentacles. The type (m, n) of a lryperedge consists of the number of source 
tentacles m and the number of target tentacles n. 

A hypergraph (E, V, s, t, l, b , /) consists of 

— a finite set of lryperedges E 

— a finite set of nodes V 

— a source function s : E — > V* assigning a sequence of source nodes to each 
edge 

— a target function t : E — > V* assigning a sequence of target nodes to each 
edge 

— a labeling function l : E —> A assigning a label from a given alphabet A to 
each edge; the label of an edge determines its type 

— a sequence of external source nodes b 

— a sequence of external target nodes / 

A lrypergraplr has the type (to, n) if it has m external source nodes and n 
external target nodes. A lrypergraplr H with nodes {i> 0 j th, • • • , v n } and edges 
{ei, . . . , e„} is called a string graph, if s(e*) = fj-i and t(e») = Vi, i = 1 . . . n. 
The node Vq is the only external source node and v n is the only external target 
node. This string lrypergraplr is a lrypergraplr based representation of a normal 
string. The letters (or words) of the string are the labels of the lryperedges. The 
letters labeling the lryperedges must be ordered in the string graph in the same 
way as in the underlying string. The lrypergraplr representation of the string 
a 2 b 2 c 2 is show in Figure 2. 

A hyperedge replacement rule consists of a left hand side edge that is 
replaced by the lrypergraplr on the right hand side of the rule. Hyperedge and 
lrypergraplr must have the same type. In Figure 3 the lryperedge replacement 
rules generating the string language a n b n c n are shown. 
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Fig. 3. Hyperedge replacement rules to generate the string language a"b n c n . 




Fig. 4. The derivation of a 3 b 3 c 3 using the grammar given in Fig. 3. 



When replacing a hyperedge with a graph as given in a production, the edge 
is first removed from its original host graph. In the resulting hole, the right 
hand side graph of the production is inserted except for its external nodes. The 
attaching nodes of the tentacles of the removed lryperedge are used instead: the 
ith source or target node of the removed edge replaces the itli external source 
or target node of the right hand side hypergraph. The derivation for a 3 b 3 c 3 
is given in Figure 4 using the productions given in Figure 3. In Figure 5 the 
corresponding derivation tree is shown. 

A context-free hyperedge replacement grammar G = (T, N. P, S) con- 
sists of 

— a finite set of terminal edge labels T 

— a finite set of nonterminal edge labels N 

— a finite set of productions P where each production has one hyperedge la- 
beled with a nonterminal label on its left hand side 

— a starting lryperedge labeled S. 

The productions given in Figure 3 are part of a hypergraph grammar with 
terminal symbols {a, 6,c}, nonterminal symbols {A, S'} and start symbol S. 

A context-free lryperedge replacement grammar is a string generating 
grammar if the language generated by the grammar consists only of string 
graphs. This is the case in Figure 3. a n b n c n cannot be generated by a context- 
free Chomsky grammar. 

Please note that for the rest of this paper we assume that the string gen- 
erating hypergraph grammars are reduced, cycle-free and e-free [9]. There are 
no unconnected nodes. If a lryperedge replacement grammar generates a string 
language, the start symbol must have one source and target tentacle. Each node 
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Fig. 5. The derivation tree for the derivation given in Fig. 4. The dotted lines mark 
the nodes must be matched. 




bl fl 

Q-HnpI— —o 




bl , , fl 

O- H Adv | 



bl . , fl 

O-^- j schnell r ^O 




bl . fl b2 f2 

O— ►[hatJ-^O 0-»- |gearbeitet| -»-0 



Fig. 6. Hyperedge replacement productions for Er hat schnell gearbeitet. (He (has) 
worked fast). 



is connected to at most one source tentacle and one target tentacle; otherwise 
no string language will be generated [10]. 

In the next section, a natural language example for context-free string gen- 
erating lryperedge replacement grammars is provided. 



3 A Natural Language Example 

The productions shown in Figure 6 are necessary to generate example (1) Er 
hat schnell gearbeitet. (He (has) worked fast.). Most of them resemble the usual 
grammar productions used in Chomsky grammars for natural languages. S — » 
NP VP splits a sentence S into a noun phrase (NP) and a verbal phrase (VP). 
In our example, the noun phrase becomes the personal pronoun er (he). The 
verbal phrase has two parts, an adverb Adv and the verb V. This is the most 
interesting rule, since the use of lrypergraphs becomes evident. With their help, it 
is possible to split the verb V into two parts. The two parts are separated by the 
adverb. On the right hand side of the corresponding rule, the lryperedge labeled 
V modeling the verb has two source nodes and two target nodes. Figuratively 
spoken, one enters the verbal phrase through 61, leaves it for the adverb through 
fl, reenters for the second half of the verbal phrase through 62, and leaves again 
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through / 2. Finally, the verb has to be derived into its two halves, the auxiliary 
hat (has) and the past participle gearbeitet (worked). Adv is derived into schnell 
(fast). 

4 Related Work 

There are a lot of occurences of discontinuous constituents in natural languages 
like parenthical placement, right node raising, relative clause extraposition, 
scrampling and heavy NP shift. It is not possible to ignore these phenomena 
or to always use a work-around. A formalism is needed that can handle these 
problems. The most famous German treebank, the Negra corpus [11] and the 
largest English treebank, the Penstate treebank [12], contain notations for dis- 
continuous constituents. Several proposals have been made on how to extend 
context-free grammars. Chomsky himself suggested transformations that move 
constituents around in addition to grammars [13]. Transformations should help 
to eliminate the need for discontinuous constituents. Other formalisms like Lexi- 
cal Functional Grammar [14] have a context-free backbone extended with feature 
structures. Unification of feature structures after parsing ensures that discon- 
tinuous constituents that belong together are found. This is called functional 
uncertainty. Nevertheless, discontinuous constituents have to be split into seper- 
ate parts within the context-free backbone. An overview on discontinuous con- 
stituents in HPSG (Head-Driven Phrase Structure Grammar ) using a similar 
mechanism can be found in [15]. Other approaches based on phrase structure 
grammars separate word order from the necessary constituents [16]. These ideas 
have been extended in [17] where the discontinuous phrase structure grammar 
is formally defined. As an extension for Definite Clause Grammars, Static Dis- 
continuity Grammars are presented in [18]. In this approach, several rules can 
be applied in parallel. There may be gaps between the phrases generated by the 
rules. This way, several rules can handle one discontinuous constituent in paral- 
lel. Compared to all these approaches, string generating hypergraph grammars 
have one advantage: the discontinuous constituent is modelled with one symbol! 
The only implemented parser for general hypergraph grammars is described in 
[19]. This parser is similar to the Cocke-Kasami- Younger parser [16] for general 
context-free Chomsky grammars and embedded into Diagen, a diagram editor 
generator. 

5 Earley-Parsing 

The Earley parser for general context-free string grammars was first presented in 
[4] and is now widely used and has been extended in several ways [9] within natu- 
ral language processing. It implements a top-down search, avoiding backtracking 
by storing all intermediate results only once. 

When parsing with string grammars, positions at the beginning of the string, 
between the letters and at the end of the string to be parsed are numbered. When 
parsing aabbcc seven positions are necessary, ranging from 0 to 6. From position 
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Fig. 7. Two examples for “dotted” rules using a grammar rule shown in Fig. 3. 



0 to position 1 the first a can be found, from 1 to 2 the second a, etc. The 
last letter c lies between 5 and 6. This numbering scheme is easily transferred 
onto string lrypergraphs; the nodes in the hypergraplr are numbered from 0 to 6 
as done at the bottom of Fig. 2. These position numbers are necessary for the 
main data structure of the Earley algorithm, the chart. When parsing a string 
sosi . . . s n - 1 consisting of n letters, the chart is a (n+ 1) x (n + 1) table. In our 
running example we have a 7 x 7 table. 

In this table, sets of chart entries are stored. Entries are never removed 
from the chart and are immutable after creation. A chart entry at position (i,j) 
in the chart contains information about the partial derivation trees for parsing 
the substring st . . . Sj- 1 . This information consists of the currently used grammar 
production and information about the progress made in completing the subtree 
given by this production. For string grammars, chart entries are visualized by 
so called dotted rules, where the dot marks the parsing progress. If the dot is 
at the end of the rule’s right hand side, the chart entry is finished or inactive, 
else, it is active, i.e. ready to accept a terminal or an inactive chart entry. This 
concept can easily be transferred to hypergraphs. Two dotted hypergraplr rules 
are shown in Fig. 7. The node larger than the others symbolizes the dot. It 
marks what parts of the right hand side of the rule have already been found. On 
the left side of Fig. 7, nothing has been found yet, as the dot is at a external 
source node of the graph. On the right side, a has already been found, the next 
symbol that has to be found is A. If the dot, also called the current node, is one 
of the external target nodes, the edge is inactive, otherwise it is active. Both 
rules shown in Fig. 7 are active. 

The classical Earley algorithm consists of three steps that alternate until the 
possibilities to apply one of them are exhausted. These steps are called shift, 
predict and complete. The algorithm processes one terminal of the input string 
from left to right at a time by applying the shift operation, shift extends 
all active chart entries ending at the position of the new terminal symbol and 
expect this terminal symbol (it follows their dot) . The extended chart entries are 
added to the chart, complete performs the analogous operation for inactive chart 
entries. Since an inactive chart entry represents a complete derivation subtree, 
active chart entries that expect the left hand side symbol of the inactive entry can 
be extended, advancing the dot by one position, predict is applied whenever an 
active entry is inserted into the chart. If the inserted entry expects a nonterminal 
symbol, predict inserts new chart entries starting at the active entry’s ending 
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position and carrying the expected symbol on their rule’s left hand side. If these 
prediction entries become inactive during parsing, the prediction was correct and 
they will be used to complete the chart entry for which they were predicted. 
A successful parse has been found when a chart entry with the grammar’s start 
symbol S at its rule’s left hand side has been inserted into position (0, n) (where 
n is the length of the input string). 

Our variant of the Earley algorithm consists of modifications of these three 
main steps (procedure names differ slightly to avoid misunderstandings) , and is 
thus very similar to the classical Earley algorithm. The main difference lies in the 
role of inactive chart entries. In the classical algorithm, an inactive chart entry is 
always finished, i.e. all right hand side symbols have been matched to a part of 
the input string. In our algorithm, a chart entry becomes inactive when the cur- 
rent node dot reaches an external target node of the right hand side hypergraph. 
While this inactive chart entry will already be used for the complete step, the 
chart entry is not necessarily finished, as there may be several external source 
and target nodes, predict will insert continuation entries, active entries that 
restart the parsing of an inactive entry through another external source node, if 
it encounters a nonterminal lryperedge over which the dot already has stepped 
before. 

In our extension of the Earley algorithm, a chart entry e consists of the 
following information: 

— rule(e), the hypergraph rule that is used 

— currentnode(e), the dot inside the rule’s right hand side 

— entrynode(e), one of the external source nodes of the rule 

— from(e), the index of the first symbol of the input string that is covered by 
this chart entry. 

— to(e), the index of the last covered symbol of the input string plus one 

— predecessor (e) 4 , the active chart entry extended to create e, or null 

— continuation-of(e) 5 , an inactive chart entry that has been continued by 
this edge or one of its predecessors, or null 

— parts(e,h), for any lryperedge h that is part of the rule’s right hand side, 
either a chart entry describing a derivation of this edge, or null 

The entry node is the external source node used to “enter” the hypergraph 
during parsing. In Fig. 7 the entry node is both times 61. Both from and to are 
integers between 0 . . . n where n is the length of the input string. The substring 
ranging from Sj rom to Sf 0 _ 1 has been matched to a path in the hypergraph 
between the entry node and the current node. E.g. the chart entry from 3 to 5 
encompasses S 3 S 4 of the string to be parsed. 

parts (e,h) is defined for lryperedges h that are part of the right hand side 
hypergraph of rule(e) and for which some derivation has been found during the 
parsing process up to now. Its value is the terminal or the chart entry represent- 
ing the derivation of this edge. It is set during the complete step. 

4 Actually, only either parts or predecessor is necessary for the algorithm. We use parts 
for algorithmic purposes, but include predecessor for the visualization of the chart. 

5 ‘of’ refers to the function’s value, not its parameter. 
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We will now introduce an Earley-style algorithm for parsing strings generated 
by a lrypergraplr grammar with the properties mentioned at the end of section 2. 
While we explain the algorithm in detail, a running example will be provided in 
Figures 8 to 14. The example visualizes all chart entries created when parsing 
aabbcc with the grammar given in Fig. 3. The chart entries are numbered for 
convenience. For each chart entry e, first from(e) and to(e) is given, followed by 
rule(e) and entrynode(e). currentnode(e) is given indirectly by drawing this node 
larger than the others. The current node is set to the entry node of the right 
hand side in the example. The brackets () following each terminal or nonterminal 
edge h represent parts(e,h). These brackets are empty in Fig. 8 if no parts have 
been found yet. The last three columns are useful to visualize how a chart entry 
is created: continuation- of points to the inactive chart entry of which the current 
entry is a continuation entry, predecessor points to the active chart entry from 
which the current entry has been grown in the complete step. In the last column, 
the operation that created the chart entry is stated. 

The main procedure, parse, returns all possible derivation trees for a given 
string of terminals and the start symbol S . 6 

procedure parse : 

returns: trees, a set of derivation trees 
parameters: S, a nonterminal label 

input, a string of terminal symbols 
trees := {} 

for all rules r where label (left-hand-side (r)) = S 
p := initial -prediction(r) 
if p is not in chart [0, 0] 
insert p into chart [0, 0] 
predict-f or (p) 

for i = 0 .. length(input) -1 
shift (i, input [i] ) 

for all inactive entries e in chart [0, length (input)] 
if label (left-hand-side (rule (e) ) ) = S 
insert generate-tree(e) into trees 
return trees 

Since we are using a top-down parsing approach, it is necessary to recursively 
predict the leftmost parts of possible derivation trees, starting with the start 
symbol S, and using the procedure predict-for (detailed below). 

initial-prediction(r) creates a chart entry from 0 to 0 using rule r. Its 
current node is set to the only possible entry node, since S is of the type (1,1). 
parts is null. In Fig. 8, two entries are inserted for the two S rules of the grammar. 
predecessor and continuation- of axe null. 

The main part of parsing happens in the shift-loop: one terminal symbol at 
a time is used to complete existing, active chart entries. Further predictions and 
completions are performed recursively, as detailed below in the procedure shift. 
Finally, each derivation tree we found during a successful parse is represented 



The indentation of the pseudo-code marks the end of loops and alternatives. 
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from to rule with current node and parts 



entry conti prede 
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of 
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initial-prediction(...) 



initial-prediction(...) 



Fig. 8. Chart entries for strings ending at node 0 when parsing aabbcc. 



by a chart entry with the label S from 0 to n. The corresponding derivation tree 
is trivially built by recursion on the entry’s parts. 

Next, the function predict-f or is explained: 

procedure predict-f or: 
returns : nothing 

parameters: e, an active chart entry 

h := hyperedge following currentnode(e) 
if label (h) is terminal label, abort 
if parts (e,h) is defined 

// we have reached a hyperedge that has already been traversed 
c := generate-continuation (parts (e ,h) , e) 
if c is not in chart [to(e), to(e)] 
insert c into chart [to (e), to(e)] 
predict-f or (c) 

else 

for all rules r where label (left-hand-side (r) ) = label (h) 
c := generate-prediction(e,r) 
if c is not in chart [to(e), to(e)] 
insert c into chart [to (e), to(e)] 
predict-f or (c) 

predict-for inserts prediction entries for an active chart entry e that 
expects a nonterminal symbol. What e expects to parse next is determined by 
looking at the hyperedge h following the dot. In the case of a terminal, no 
prediction is necessary, since the entry will be completed if the matching terminal 
is shifted. 

If h has not been used for parsing before (parts (e,h) = null), we proceed 
analogous to the classical Earley algorithm: 

The function generate-prediction returns a new active chart entry c that 
may become the root of a possible derivation tree of the lryperedge following 
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2 shift(0,a) 



Fig. 9. Chart entries for strings ending at node 1 when parsing aabbcc. 



currentnode(e), using the rule r. No part of it has yet been matched to the 
input string, c starts and ends at to(e), rule(c) := r, currentnode(c) is set to 
the begin node that corresponds to currentnode(e), using the unique mapping 
between a lryperedge’s nodes and its replacement hypergraph’s nodes, parts (c,x) 
is undefined for all lryperedges x. predecessor and continuation- of are null. This 
entry is inserted into the chart. 

In Fig. 9, two such prediction entries are shown. In chart entry 3, after having 
shifted over the first a in the input string, the hyperedge following the current 
node is labeled A. Two chart entries must be predicted for the two rules with 
left hand side A. 

A phenomenon that is new compared to parsing with context-free Chomsky 
grammars (e.g. the classical Earley algorithm) is that multiple, separate sub- 
strings may form a derivation of the same nonterminal symbol. In order to cope 
with this, we introduce the concept of a continuation chart entry. When, 
during parsing, the current node reaches a lryperedge of the right hand side lry- 
pergraph to which a portion of the input string has already been matched, we 
predict an active chart entry that is consistent with the last chart entry that 
represented this nonterminal, generate-continuation returns such a contin- 
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Fig. 10. Chart entries for strings ending at node 2 when parsing aabbcc. 



uation entry that represents an already partially matched possible derivation 
of h. This nonterminal lryperedge has already been “entered” once through a 
source node and has been “left” again through a target node. Now this lryper- 
edge is “reentered” through another source node. The chart entry c returned 
by generate-continuation(p,e) starts and ends at to(e), rule(c) := rule(p), \/x 
parts(c,x) := parts(p,x), predecessor (c) := null, c is a continuation of p. cur- 
rentnode(c) is determined the same way as above. 

In Fig. 13, the chart entry 15 is predicted from the chart entry 14. In chart 
entry 14 the hyperedge A is reentered through its second source node. A{\\) 
states that this hyperedge has been handled before in chart entry 11, whose 
continuation will now be predicted. 

Next, shift and complete-with are explained: 
procedure shift : 
returns : nothing 

parameters: position, a non-negative integer 
t, a terminal symbol 

for all active chart entries where to(e) = position 
h := hyperedge following currentnode(e) 
if (label (h) = t) 

insert completion(e,t) into chart [from(e) , to(e)+l] 
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Fig. 11. Chart entries for strings ending at node 3 when parsing aabbcc. 



from to rule with current node and parts 




entry conti prede 
node nuation cessor 
of 



1 - 12 shift(3,b) 



Fig. 12. Chart entries for strings ending at node 4 when parsing aabbcc. 



if completion(e,t) is inactive 
complete-with(completion(e ,t) ) 
else 

predict-f or (completion (e ,t) ) 



procedure complete-with: 
returns : nothing 

parameters: ia, an inactive chart entry 

for all active entries e where to(e) = from(ia) 
if expects (e , ia) 

insert completion(e , ia) into chart [from(ia) , to(ia)] 
if completionfe , ia) is inactive 
complete-with(completion(e , ia) ) 
else 

predict-f or (completionfe , ia) ) 

shift and complete-with perform similar operations: they try to extend 
active chart entries with a new part. If this completion is possible (see below), 
the resulting chart entry represents a larger part of the input string and is 
inserted into the chart. 

If the completed entry is inactive, other chart entries can be completed with 
it. If it is active (expecting more input), we insert prediction entries. 
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Fig. 13. Chart entries for strings ending at node 5 when parsing aabbcc. 
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Fig. 14. Chart entries for strings ending at node 6 when parsing aabbcc. 



The function expects (e , ia) is extended compared to the original Earley 
algorithm, expects (e , ia) determines if a given inactive edge ia will be accepted 
for completion of e. Please note that parsing of an inactive edge is not necessarily 
finished; an edge is inactive if the current node, the dot, has reached a target node 
of the rule. If the label or type of the left hand side of ia’ s rule differs from e’s 
expected nonterminal edge label or type, expects(e,ia) is false. If the node used 
to enter ia, entrynode(ia), does not correspond to currentnode(e), expects(e,ia) 
is false. And if ia is a continuation chart entry, but the inactive entry that has 
been continued does not match parts (e,h), ia represents a different derivation of 
the hyperedge than the one we assumed the last time it was traversed; therefore, 
expects(e,ia) is false. It is true otherwise. 

The function completion(e,x) creates anew chart entry, either by accepting 
a terminal symbol x, or by accepting an inactive edge x into the partial derivation 
tree. Let h be the hyperedge following the current node inside e. The new chart 
entry c ;= completionf e,x) is identical to e, except for the following modifications: 
The substring of the input covered by c reaches from the start of the active entry 
e, from(e), to the end of the inactive entry x, to(x), or to to(e)+l for a terminal 
label x. predecessor (c) is set to e. parts(c,h) is set to x, i.e. c’s derivation consists 
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of the same subtrees as e’s except for the new part x. If x is a, continuation 
edge, parts(e,h) is already defined; the change of definition is intended, since all 
information held by parts (e,h) is also held by x. Furthermore, the current node 
pointer is advanced over h, using currentnode(x) to determine the correct target 
node of h, unless a; is a terminal symbol. This is possible because of the unique 
mapping between the source and target nodes of h and the source and target 
nodes of the replacement lrypergraplr. 

In Fig. 9, the shift over the first a of the string to be parsed is shown. 
The dot moves over the hyperedge labeled a, the entrynode is still 1 and the 
predecessor is the first chart entry. In Fig. 11, a complete-with is shown. In 
chart entry 12, the hyperedge labeled A is completed the first time. The inactive 
chart entry 11 is taken by the active chart entry 3. 

The complexity of our algorithm is comparable to the underlying, classical 
Earley algorithm, 0(n 3 ). The usage of more complicated data structures intro- 
duces a constant penalty factor on space and time complexity in several places 
(or a factor of O(k) where k is the maximum number of right hand side hy- 
peredges in a grammar; but we regard k as a constant). Since the only point 
in our algorithm where it distinctively differs from classical Earley, aside from 
transferring its concepts onto string generating lrypergraplrs grammars, is the 
prediction of continuation entries, and only one such continuation entry is in- 
serted during prediction (instead of several prediction entries), the complexity 
class of the algorithm itself does not increase [10]. 



6 Conclusion 

String generating lrypergraplr grammars are a theoretical concept introduced in 
[5] . Context-free lryperedge replacement can model string languages that are not 
context-free in the usual Clromskian sense. In this paper an Earley-based parser 
was presented for string generating lrypergraplr grammars. The major extensions 
compared to the original Earley algorithm are the introduction of inactive chart 
entries that can be activated again. This is the case when a lrypergraplr was 
“entered” through one external source node, “left” through an external target 
node and “reentered” again through a new external source node. The parser 
described has been implemented in Java [10]. A German grammar and lexicon 
is currently developed. 

This parser can be extended in may ways as shown in [9]. First, it is in- 
teresting to add an agenda and implement bottom-up parsing or parsing with 
probabilities. For linguistic applications, it is useful to attribute hyperedges and 
hypergraphs with feature structures that are combined with unification. 
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Abstract. We propose to study two infinite graph transformations that 
we respectively call bounded and unbounded path transduction. These 
graph transformations are based on path substitutions and graph prod- 
ucts. When graphs are considered as automata, path transductions cor- 
respond to rational word transductions on the accepted languages. They 
define strict subclasses of monadic transductions and preserve the decid- 
ability of monadic second order theory. 

We give a generalization of the Elgot and Mezei composition theorem 
from rational word transductions to path transductions. 



Introduction 

As a theoretical model for systems verification, the study of infinite transitions 
graphs has become an active research area. The most fundamental dynamic 
behavior of these (oriented, labelled and simple) graphs is the language of their 
path labels: a class of automata corresponds to each class of graphs. 

To be interesting for formal verification, an infinite graph is also expected to 
have some decidable properties like reachability or a decidable monadic second 
order theory. These decidability results for graphs are often strongly connected 
with the classes of languages recognized by the corresponding automata: graphs 
hierarchies share many properties with classical languages hierarchies [16] . 

A standard way to show a property on a given infinite structure is to define 
this structure by transformations from a known generator. If the transforma- 
tions used are well suited, the property will be inherited from the generator. A 
transformation which preserve a given property is said to be compatible with this 
property. 

For example, monadic transductions [7] and unfolding [8] are compatible with 
monadic second order logic: any graph which is built by these transformations 
from a finite graph will have a decidable monadic theory [5]. Monadic second 
order logic is very expressive and thus undecidable for many graphs. For these 
graphs one must resign oneself to weaker decidability results like reachability or 
reachability under rational control. To study such properties, one must consider 
graph transformations which are strictly weaker than monadic transductions. 

This article is the continuation of [17] which extended to infinite graphs the 
classical language theoretic notion of abstract family [11,12]. Many families of 
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graphs, like higher order pushdown graphs [5], Petri nets transitions graphs or 
automatic graphs [3] can be characterized by using a common and restricted set 
of simple graph transformations from specific generating systems [18]. 

Abstract families are a way to build generic proofs: we consider families 
of graphs without specifying what the graphs are. We only require that the 
families are closed under certain operations. In language theory, rational word 
transductions [12, 2] are one of these fundamental operations. 

Rational transductions have been naturally extended from words to trees [15] 
but no convenient automaton based definition of graph transduction have been 
given. In this article we study two graph transformations that we respectively 
call bounded path transduction and unbounded path transduction. These trans- 
formations are not defined in terms of finite automaton nor in terms of logic 
but rather in terms of path substitution, path morphism and graph product. 
They define strict subclasses of monadic transductions [7] which preserve the 
decidability of monadic second order theory but their principal characteristic 
is to respect the orientation of edges and thus being compatible with language 
transformations . 

Given a path transduction T, we can build a word rational transduction T 
such that for any automaton Q (finite or not) we have: 

L(T(G)) = T(L(Q)) . 

Conversely, given a rational word transduction i?, we easily build a path trans- 
duction T verifying R = T. 

The most important and new result is the generalization of the Elgot and 
Mezei composition closure theorem from rational word transductions to path 
transductions. 

This article is divided into four parts : the first part recalls some basic def- 
initions about graphs, automata and rational word transductions ; the second 
one introduces the notion of (language) compatible transformation and gives a 
formal definition of path transductions ; the third one is devoted to bounded 
path transductions and the last to unbounded path transductions. 

1 Preliminaries 

1.1 Graphs and Automata 

We use the standard definition of graphs and automata but here the state space 
is not supposed to be finite. 

Definition 1. A E-T -graph structure (or more simply a E-graph or a graph) 
is a tuple ( E , r, Q, (R a )aeE, ( Pb)b&r ) where: 

— E is an alphabet of edge labels ; 

— r is an alphabet of states labels ; 

— Q is a countable set of states (or vertices) ; 

— for each label a £ E , R a C Q x Q is the relation labelled by a ; 

— for each label b £ T, Pt, C Q is the unary predicate (the set) labelled by b ; 
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Recall that the composition of two relations R and S on Q is the relation 

R ■ S = {(p, q) G Q x Q | 3r G Q, pRr, rSq} . 

The mapping which associates to each symbol a G E its relation R a is extended 
to words according to the following rules: 

R s = {(p,p) | p G Q} (e denotes the empty word) , , 

Rau = Ra ' Ru a G E,U £ E* . 

We will also consider the relation labelled by a set L of words: 

Rl = {{p, q) I G L, pR w q } . 

The path label language of Q between two sets of vertices A and B is the set 
of finite words 



L(G, A, B) = {w £ S* | R w r Ax B ± 0} . 

An automaton is a graph structure with two unary predicates : Pi for the initial 
states and Pf for the final ones. The language L(A) of an automaton + is the 
path label language L(A, Pi, Pf). 

1.2 Word Transductions 

Rational word transductions are fundamental tools for the study of formal lan- 
guages. We give here some basic definitions of this concept. See [1,2,12] for 
further details. 

Definition 2. A rational transduction between and is a rational subset 
of the monoid A* x f or the canonical concatenation (u, v) • (w, x ) = (uw, vx). 

A rational transduction is usually defined, as in Figure 1, by a finite automa- 
ton, called a transducer, which is labelled by pairs of words. An equivalent way 
to define such a relation is to give its morphism decomposition: 

Property 1. Any rational transduction R is the composition of an inverse mor- 
phism / _1 , an intersection with a rational language K, and a direct morphism 

9- 

R = r 1 ■ (n/f) • g = {(f(w),g(w)) I w G K} . 

A relation R between El and E% is faithful when for any w G E.], R” 1 (’u:) is 
finite ; it is continuous when R _1 ({e}) = {e}. These two properties are important 
for the study of quasi-real-time acceptors [10] because a transduction which is 
both faithful and continuous is unable to erase more than a bounded number of 
letters. 

We give here a representation of faithful and continuous rational transduc- 
tions which was given in [4] . Recall that a morphism a from El to is strictly 
alphabetic when a (a) G E 2 for any a G E\. 




Composition of Path Transductions 



371 



a/a b/b 



O b/a O e/6 




e/a 

Fig. 1. A transducer recognizing the function a n b m h- > a n+1 b m . 

Theorem 1. (Boasson and Nivat) A rational transduction R between E* and 
E% is faithful and continuous if and only if there exists an alphabet II, a ratio- 
nal set K £ Rat(II*), a morphism f from II* to E* and a strictly alphabetic 
morphism a from II* to E% such that 

R = {(/(«>), a(w)) | w £ K} . 

The Elgot and Mezei composition theorem is an important property of ra- 
tional transductions. 

Theorem 2. (Elgot and Mezei) 

The rational transductions are closed under composition. 

A proof of this result can be found in [1] or [2] . 

Because faithfulness and continuity are preserved by composition we have 
the following corollary. 

Corollary 1. Faithful and continuous rational transductions are closed by com- 
position. 

Rational transductions and their closure by composition are fundamentals 
in the theory of abstract families of languages (AFL). Figure 2 gives some of 
the most usual closure operators and the names of the corresponding abstract 
families. See [12,2] or [10] for further details on this theory. 

2 Language Compatible Graph Transformations: 

Path Transductions 

We study graph transformations which are independent of the vertices naming 
convention or, in other words, invariants with respect to graph isomorphism. 
Another fundamental restriction is that we only consider graph transformations 
that are “compatible” with language transformations. 

Definition 3. A graph transformation T of arity n is compatible (with language 
transformations) if there exists a language transformation T of arity n such that 
for any sequence ( Ai)\<i< n of automata, we have 

L{T(A i, . . . , An)) = T(L(Ai), ..., L(A n )) ■ 

Many graph transformations are impossible to translate into language trans- 
formations. Consider, for example, the simple transformation B consisting of 
adding a backward edge labelled by a for each edge labelled by an a. The non 
compatibility of this transformation is illustrated by Figure 3. 
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any language family 

inverse morhism, 
intersection with rational sets. 

( 2 )( cylinder 

| non erasing morphism 

''trio 

erasing s \ 

\morphism s' Kleene closure 




Kleene closure 
(L U M, L+ ) 



full-AFL 



Fig. 2. Different abstract families of languages from the less constrained to the most 
constrained. (1) stands for rational transductions and (2) for faithful and continuous 
rational transductions. 




Fig. 3. The two finite automata A and B are both accepting the language {o,fe}, but 
their respective images by B do not accept the same languages. 



2.1 Path Morphisms and Path Substitutions 

A morphism is a mapping which replace letters by words. More generally, a 
substitution is a relation which replaces letters by languages. Morphisms, in- 
verse morphism, substitutions and inverse substitutions are well-known language 
transformations but these transformations can be generalized as graph/automata 
transformations . 



Definition 4. Let and S 2 be alphabets and h C S\ x AJ be a relation. 
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The substitution generated by h is the relation U n >o defined inductively 
by: 

h 0 = {(£,£)} 

h n+ 1 = {(au,vu>) £ Ef +1 x E% | a £ Ei,v £ h(a),w £ h n (u)} . 

We use the same symbol to denote the substitution U n >o and its generator h. 

Example 1. If h is the rational substitution defined by h{c) = a and h(d) = ( ab )*, 
the image of the word cdcd by h is the rational language a(ab)*a(ab)* . By inverse 
we have h~ 1 {abab) = {w \ abab £ h{w)} hence h~ 1 (abab) = d + . 

When the relation is a function, which associates a unique word to each letter, 
the generated substitution is a morphism, when this image is a rational (resp. 
finite) language, the substitution is called rational (resp. finite). 

Inverse rational substitution is a simple and well defined operation on graphs. 
Each path labelled by a word in the language h(a) is replaced by an arc labelled 
by a. This path transformation do not add vertices to the graph. 

Definition 5. (Caucal [6]) 

If h is a substitution from Ef to E% and Q is a graph, then h~ 1 {Q) is the graph 
( £i,r s ,Q s ,{Ra (S) )aeSi> ( p b)bere^ where 

Pfh (d) _ j or eac /j a £ Ei . 



^4 abab 



d 




d d d d d 



Fig. 4. An automaton A and its inverse rational substitution according to h(c) = a 
and h(d) = {ab)* . In other words Rc ( ' 4) = Rff and R ^ = (R^ ■ Rif)* ■ 

From the algebraic point of view this transformation is natural: the set of 
relations { /?,£ | w £ E%} is a monoid for the composition operator. When h is 
rational (resp. finite), each relation Ra is a rational (resp. finite) subset of 
this monoid. Figure 4 gives an example of inverse rational substitution. Note 
that the languages L{h~ 1 {A)) and h~ 1 {L{A)) coincide: inverse substitution is a 
compatible transformation. 

Lemma 1. Let Q = {Ei, T,Q, {R a ) ae s 1 ,{Pb)bGr) be a graph, let i, f be two 
letters of T and let g,h be substitutions between E% and E{. 
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1. Vw G r 2 *, R h w (a) = R s Hw) ; 

2. L(h~ 1 (Q), Pi, Pf) = hr\L{g,P u P f )) ; 

3. g-\h~HG)) = (g ■ h)-\G) (W relational composition.) . 



Recall that a monoid morphism a from £* to £% is strictly alphabetic when 
a(a) G £2 for any a G £\. If a is strictly alphabetic, we write a(Q) for the graph 
obtained by replacing each R a by R a ( a )■ Direct strictly alphabetic mapping is, 
like inverse morphism, a special case of inverse finite substitution. 

It is harder to give for graphs a definition of direct rational substitution 
that extend cleanly the language transformation. The most natural way, given 
a continuous 1 substitution h, is to use an edge replacement [9] to replace each 
edge of the graph labelled by a symbol a by the automaton Q a recognizing the 
language h(a). 





L(0 o ,l,2) = /i(a) 



As proved in [17], such an (oriented) edge replacement may be simulated by 
using only finite inverse substitution and product. 

2.2 Graph Products 

The notion of graph product is well known in automata theory for its correspon- 
dence with the intersection of languages. 

Definition 6. Let g and TL be two graphs. The product g x TL is the graph 
n E h , Qf x Q n , {R a )aes g ni: w > (Pb)b£r g nr n ) 

where 

R a = {{(P,P'), (q,q')) | pR%q,p'R%q'} for each a G £ s IT £ n and 
p b = pS x P« for each b G r° fi r n . 

Lemma 2. Let g and Ti be two graphs. 

1. For any word w, we have: 

r ^ h = I vR q w <l p'R^g'} ; 

2. for any couple of letters i,f : 

L(g x H, Pf x P?,Pf x P«) = L(g, Pf , Pf) n L(H, p?^?) ; 

3. For any morphism f, we have x TO) = f~ 1 (g) x f~ l (TL) . 

if e G h(a) we need e-transitions or vertex fusion. 



1 
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2.3 Definition of Path Transductions 

In [17], an abstract family of graphs (AFG) is defined as a graph family closed by 
inverse finite substitution and product by finite graphs. A full-AFG is an AFG 
which is also closed by inverse rational substitution. These two definitions only 
ask the considered family to be closed by these operations but a question was still 
remaining for principal AFG (resp. principal full-AFG). An AFG (respectively a 
full-AFG) is principal when all its elements are derived from a single graph (the 
generator) by a finite sequence of AFG (resp. full-AFG) operations. 

Does the AFG transformations generate strictly increasing (for inclusion) 
chains of graph families ? If not, how many successive transformations are needed 
to obtain the whole family from its generator ? 

The answer to the second question is three and the transformations obtained 
by composition of the three basic AFG transformations used are, as proved in the 
next sections, what we call path transductions. We distinguish two classes of path 
transductions : bounded path transduction which are only local transformations 
and unbounded path transduction which may act more globally on the graph. 

Definition 7. 

1. A bounded path transduction is the composition of an inverse morphism, a 
product by a finite graph and a direct strictly alphabetic morphism ; 

2. an unbounded path transduction is the composition of an inverse rational 
substitution, a product by a finite graph and an inverse rational substitution. 

It is important to notice that the definition of bounded path transductions is 
a direct generalization to graphs of Boasson Nivat’s characterization of faithful 
and continuous rational word transductions (Theorem 1). The only difference is 
that instead of using word morphisms, we use path morphisms and instead of 
using intersection with rational sets we use product with finite graphs. 

The characterization of general rational word transductions with inverse sub- 
stitution is less usual but easy to deduce from Property 1. 

Property 2. Any rational word transduction is the composition of an inverse ra- 
tional word substitution, an intersection by a rational set and an inverse rational 
substitution. 

Theorem 3. 1. Given a word transduction t we can construct an unbounded 

path transduction T such that T — t. 

2. given a faithful and continuous word transduction t we can construct a 
bounded path transduction T such that T = t. 

Proof: 

The main proof is already given in [17]. For (2) we can use theorem 1. □ 

2.4 Examples 

Here is the example of a bounded transduction T, defined by a morphism /, a 
finite automaton 7 i. and a strictly alphabetic morphism a. 
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/: 



( A i— > a 
B i— > a 
C i — > aa 
Dh£ 
finii 



H : 



A E 



Xr~D^' 



Ana 



and a : 



C^b . 
D i— > c 
E i— > d 



This path transduction may also be represented directly, as in figure 5, by 
a finite transducer. The application of the path transduction is a kind of “left 
synchronized graph product” . 



a/a a/d 




a/a 

Fig. 5. Another representation of T. 

The one letter Dyck graph , illustrated on Figure 6 is a representation of the 
positive integers with +1 and —1 operations respectively encoded by the labels a 
and a. This graph is a generator for the one counter automata transition graphs. 




Fig. 6. The one letter Dyck graph A\ . 



By applying T to A\, we obtain the graph of figure 7. 



d 


« d 


d 


« d 


b 




/^b 




e 


e 


e 




/a 




/a 



a a a a 

Fig. 7. The graph T(A\). 



Figure 8, give an example of unbounded path transduction and its application 

to A\. The a -^ C edge of the transducer leads to infinite degrees in the final graph. 

The two letters Dyck graph , illustrated on Figure 9 is a binary tree with 
backward transitions. Its path label language from and to the “root” is the semi- 
Dyck language a generator of context-free languages but its “geometry” is also 
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a/a 





Fig. 8. An unbounded path transduction and its application to Ai . 



generating, by path transductions, the pushdown graphs and prefix recognizable 
graphs which are the prefix transitions graphs of finite (resp. recognizable) word 
rewriting systems 2 [13,6]. 

We can deduce the decidability of monadic second order theory for these 
graphs from the fundamental result of Rabin [14] about the binary tree. More 
examples like Petri nets transitions graphs or automatic graphs can be found in 
[18]- 






Fig. 9. A generator of pushdown AFG and prefix recognizable full-AFG. This graph 
has a decidable second order monadic theory [14]. 



3 Composition of Bounded Path Transductions 

In this section we show that the classical Elgot and Mezei theorem naturally 
extends from faithful and continuous rational word transductions to bounded 
path transductions. 

2 The definition of [13] use a restriction to a reachable component and the one of [6] 
use a restriction to rational sets of vertices. We consider here the prefix rewriting 
graphs without these restrictions. 
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To establish this result, we use the basic properties of inverse morphism and 
product (Lemma 1 and Lemma 2) but the keys of the proof are the two following 
“delayed forgetting” lemmas. 

Lemma 3. If f is a morphism and a is a strictly alphabetic morphism, then we 
can construct a morphism g and a strictly alphabetic morphism f3 such that for 
any graph Q , we have 

r 1 (a(g)) = P(g~ 1 (g)) . 

Proof: We first remark that the relational composition h = f-a~ 1 , which verifies 
h~ 1 (Q) = f~ 1 (a(Q)), is a finite substitution. For each letter a € Dom(h), we 
define the (finite) alphabet S a = {[a, w] | w £ h(a)} and take j3{[a,w ]) = a and 
g([a,w)) = w to get g(/3~ 1 {a)) = h(a) □ 

In other words, the direct strictly alphabetic morphism, which forgets infor- 
mations, may be applied after the inverse morphism. 

Lemma 4. If TL is a graph and a is a strictly alphabetic morphism, then for 
any graph Q, we have 

a(Q) x TL = a(Q x a~ l (TL)) . 

Proof: Let a be a label of TL. 

r oc(0)xH = |(( P) p/) ) q 'f) | p R^ g) qi p'R^q'X (by product clef.) 

= {((p,P r ),(q,q')) I 36, a = a(b), pR^q, p'R^q'} (by a(G) def.) 

= \((p,p'),(q,q')) I 36, a = a(b), pR%q, 3c, c=a(6), p’Rff (Ti V} 

= |((p,p');(9:9')) I 36, a = a(6), pRfq, p'Ra (H V} (because c=a ) 
_ a (gx<x~\n)) 

— -tta 



□ 



Theorem 4. Bounded path transductions are closed by composition. 

Proof: Let Tf and T 2 be two bounded path transductions. Our aim is to show 
that the relation T = T\ ■ T 2 remains a bounded path transduction. Let Q be a 
graph and let TL = T 2 {Ti(Q)) hence 

n = a 2 (/ 2 " 1 («i(/r 1 (e) x ^i)) x r 2 ) 

where /j are morphisms, T% are finite graphs and a, are strictly alphabetic 
morphisms. By Lemma 3, there exists a morphism g and a strictly alphabetic 
morphism (3 such that a\ ■ ftf 1 = g _1 ■ fi , hence 

TL = a 2 (f3(g- 1 (fr 1 (G) x ft)) x ft) . 

By Lemma 2, g being a morphism, we have: 

x ft) = <T 1 (/f 1 (£0) x g-\T 1 ) , 
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hence 

n = a 2 (P(g- 1 (fr 1 m X g~\ft)) x ft) . 

By Lemma 1, the composition of the two inverse morphisms ff 1 and g^ 1 is an 
inverse morphism: if we set h = g ■ fi and T = g _1 (iF 1 ) ) then we have: 

H = a 2 ((3(h~ 1 (g) xft) x ft) . 

By Lemma 4, we may substitute /3(h~ 1 (g) x T) x T 2 by /3{h~ 1 {Q ) x T 
x f3~ 1 (ft)) . With 1C = T x ^~ x [fF 2 ) and 7 = a 2 o (3 : 
for any graph Q 

a 2 (f 2 1 (a 1 (fi 1 (G) x ft)) x ft) = 7 (h~\G) x 1C) D 



4 Composition of Unbounded Path Transductions 



In this section we show that the classical Elgot and Mezei theorem naturally 
extends from rational word transductions to unbounded path transductions. The 
construction is similar as the one of Section 3 but, unlike inverse morphism, 
inverse rational substitution is not compatible with graph product. In general 
situations we have 

h~\g x h) ± h~\g) xh-\n) . 

So we need a stronger “delay” property than Lemma 4. This property is true 
up to isolated vertices. 

A vertex/state is isolated if it is not connected to an edge nor labelled by a 
predicate. More formally recall that the image of a relation R C Q x Q by a 
function / of domain Q is f(R) = {(/(p), /(<?)) | pRq} • We say that two S-T- 
graplrs Q and 7i are isomorphic up to isolated vertices , and we write Q « 7Y, if 
there is an injective function / from to Q n such that 

Va G S, f(Ra) = R? and , V6 G R f(P b 5 ) = P « . 



Lemma 5. IfTt is a finite graph and h is a rational word substitution then there 
exist a rational substitution k, a morphism f and a finite graph 1C such that for 
any graph Q, we have 

h-\g)xn^k ~ 1 (r\g)xic) . 

Proof: For any letter a appearing in ran(h) 3 , we define a fresh symbol r a (used 
to denote an e-transition). We define the alphabetic morphism /, by 



/ /(a) = a 
1 f(r a ) = £ 



for any letter a appearing in ran(h). 



We build the graph /C from the graph H by edge replacement: each edge labelled 
by a is replaced by an automaton accepting the rational language r a /i(a)r a . 

Recall that ran(h) = {u | 3u, v G h(u)}. 



3 
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When e £ h(a) : 




When e G h(a ) : 




L(g a ,3,4) = h(a) 



L{Qa, 3,4) = h(a ) - {e} 



In other words, we have a direct substitution /C = k( H) with k defined by 
k(a) = T a h(a)T a for all a G ran(h) (k is continuous). Note that for any couple of 
states p,q G Q k , any label a and any word u G h(a), we have 

pR^q PRr a ur a Q ( 2 ) 

and 

pR° u q pR f Ta ^q (3) 

Let a be a label of Tt: 

R h (5)x« = r ((p,p'),(q, q ')) | 3m G h(a ), pR%q, p'R^q'} 

= {({P,P'), (<?, <l')) I 3m G h(a), pR% p'Rf aUTa q’} by (2) 
{{p,p'),(q,q')) I 3u G h(a), pRl a urVq, P' R r a ur a q'\ b yW 

{(p,p'),(q,q')) I 3m G h(a), {p,p')Rl a UTa )XK: (q,q')\ (Lemmas) 
i(p,p'),{q,q')) I 3m g k(a), {p,p’)R f v (e)x/C (g,g')} ‘w-.m.k . 

Hence, for any a G ran{h ), we have 

U rl (5)xH = ] ? f( rl(e)xK ) 

The two graphs are isomorphic up to the vertices of Q ^ — Q n which are isolated 
in k- 1 □ 

Theorem 5. Up to isolated vertices, unbounded path transductions are closed 
by composition 

Proof: Let T\ and T 2 be two unbounded path transductions. Our aim is to show 
that the relation T = T\ ■ T 2 remains an unbounded path transduction. 

Let Q be a graph and let TL = T 2 (Ti(Q)) hence 

n = x tF\)) x T 2 ) 




where gi and hi are rational substitutions and T% are finite graphs. 
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By composition of rational substitutions (Lemma 1) we deduce a rational 
substitution <73 = g 2 • h\ such that 

H = h^(g^(g^(g)xT 1 )xT 2 ) . 

From Lemma 5 we deduce a rational substitution k, a morphism / and a finite 
graph AC such that 

gs\gl\G) x Tx) XT 2 ^ k- 1 (/- 1 ( 5 r 1 (^) X T r) x AC) . 

The function / being a morphism, by Lemma 2 we get 

si - '(sr'teo x Tx) x t 2 « fc- 1 (r 1 ^- 1 ^)) x r 1 ^)) x ac) . 

With g = f ■ gx, h = h 2 • k, and T = x AC for any graph we have 

^ 2 “ 1 (S2“ 1 (/ir 1 (sr 1 (e) X Tx)) X ^2) » h-\g~ x {g) x F) □ 

5 Conclusion 

In the theory of languages and automata, there was an asymmetry between an 
algebraic characterization for word languages and a “mechanical” one for their 
acceptors. Much work has been done to exhibit useful algebraic language trans- 
formations but the corresponding acceptors transformations remained scattered 
and hidden in some technical lemma. 

Infinite labelled graphs/automata gives a natural framework to study the 
behavior of acceptors. We proposed the notion of path transduction which is 
a graph transformation. With this definition, the correspondence between the 
graph transformation and the language transformation becomes clearer than it 
is in [17]. The AFG and full-AFG defined in [17] may be respectively defined by 
closure under bounded and unbounded path transductions. 

We also showed that path transductions are closed by composition, general- 
izing the classical Elgot and Mezei theorem from words to graphs. This property 
gives a canonical form for graphs in principal AFG like pushdown graphs [13] or 
prefix recognizable graphs [6] . 

To deepen the knowledge about graphs as language acceptors, a systematic 
study of language-compatible graph transformations and their application on 
infinite automata is required. The generalization of rational word transductions 
to path transductions is one step in this systematic study. 

Path transductions are strictly weaker than monadic transductions, but their 
expressive power is important: for example the transitions graphs of Petri nets 
are the images of hyper grids by path transductions. Another interesting aspect 
that is not developed here is that, being weaker than monadic transductions, 
path transductions preserve also weaker decidability properties like reachability 
under rational control. 
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Abstract. We propose a faithful encoding of Java programs (written in 
a suitable fragment of the language) to Graph Transformation Systems. 
Every program is translated to a set of rules including some basic rules, 
common to all programs and providing the operational semantics of Java 
(data and control) operators, and the program specific rules, namely one 
rule for each method or constructor declared in the program. 

Besides sketching some potential applications of the proposed transla- 
tion, we discuss some desing choices that ensure its correctness, and we 
report on how do we intend to extend it in order to handle several other 
features of the Java language. 



1 Introduction 

Graph Transformation Systems (GTSs) were introduced about four decades ago 
as a generalization of Chomski Grammars to non-linear structures. Since then, 
the theory evolved quickly [24] , and a growing community has applied GTSs to 
diverse application fields [7]. Along the years, GTSs have been used to provide an 
operational semantics to programming languages [22] , to systems made of agents 
(or actors) interacting via message passing [15,16,6], to object-oriented speci- 
fication formalisms [26], to process calculi [10], and to several other languages 
and formalisms. 

By encoding a computational formalism into GTSs, usually one enjoys several 
benefits. Firstly, the representation of states as graphs is often quite abstract and 
intuitive, as it allows to ignore irrelevant details: in fact, graphs are considered 
usually “up to isomorphism”, and this corresponds to considering states up to 
the renaming of bound variables or names. Secondly, the rich theory of GTSs 

* Work partially supported by projects IQ-Mobile (CNPq and CNR), PLATUS 
(CNPq), AGILE (EU FET - Global Computing) and SegraVis (EU Reserach 
Training Network). 
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provides interesting results that may be applied to the original formalism via the 
encoding. As an example, the concurrent behavior of GTSs has been thoroughly 
studied and a consolidated theory of concurrency is now available, including 
event structure, process and unfolding semantics [23,3]; the encoding of process 
calculi as GTSs equips such calculi with a concurrent operational semantics [10], 
to which the mentioned results can be applied. 

Along these lines, in this paper we propose a translation of Java programs, 
written in a suitable fragment of the language, into Graph Trasformation Sys- 
tems. The fragment of Java for which we present such an encoding is small, 
but still significant. It includes operators on primitive data types, class decla- 
rations, assignments, conditional statements, (static or instance) method invo- 
cations with parameter passing and value returning, and construction of new 
objects. A Java declaration , i.e., a closed set of class definitions, is translated 
into a graph of types and a set of typed rules that specify the (potential) behavior 
of the program. The states of execution of a Java program are represented by 
graphs which encode aspects related to both the data structures and the control 
of the program. As expected, information which are not relevant at run-time, 
like names of local variables or of formal parameters of methods, do not appear 
explicitly in the graphical representation of states. The rules resulting from the 
translation of a Java declaration are divided in two kinds: the basic rules, that 
are common to all Java programs and specify the operational meaning of the var- 
ious (functional and control) operators of the language; and the program specific 
rules, one for each method or constructor, which replace a method/constructor 
call with the corresponding body. After presenting the translation, we discuss 
some design choices we made, and also how do we intend to extend the transla- 
tion in order to handle other features of Java, including arrays, inheritance and 
multi-threading . 

There are several potential applications of the kind of translation we propose: 
we sketch here two of them. Recently, various analysis techniques for GTSs have 
been proposed, some of which are based on the unfolding semantics (see, e.g., [2] 
for the finite-state case, and [4] for the general case) . We intend to investigate how 
far such techniques can be applied to the systems obtained from the translation 
of Java programs, and thus, indirectly, to the Java programs themselves: the 
leading intuition is that relevant properties of Java programs can be formulated, 
in a GTSs setting, as structural properties of the graphs reachable in the system. 
Having this application in mind, we restricted the format of the rules obtained 
from the translation in order to match the constraints imposed by [2,4], the 
most relevant of which is that rules must be injective. 

Another application we have in mind is to provide a pure GTS-based seman- 
tics to Java- attributed GTSs, the graph transformation model adopted in tools 
like AGG [1]. In this model, the items of a graph can be associated with arbitray 
Java expressions (the attributes), and in a rewriting step such expressions are 
evaluated to compute the attributes of the newly created items. By exploiting the 
proposed translation of Java to GTS, we may translate a Java-attributed GTS 
to a plain typed GTS by first encoding the attributes as graphs, and then simu- 
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lating an attributed graph rewriting step as a sequence of typed graph rewriting 
steps, where the evaluation of the attributes is performed graphically. 

On the one hand, the translation just sketched would allow us to explore 
the applicability of the above mentioned analysis techniques to Java-attributed 
GTSs. On the other hand, we are confident that it will provide a formal ground on 
which to address a well-known limitation of the expressive power of attributed 
GTSs, namely the fact that graphical items may refer to attributes, but at- 
tributes cannot refer back to graphical items. For example, according to the 
standard approaches [17], it is not possible to define a graph where a vertex has 
as attribute a list of vertices of the graph itself. 

The paper is organized as follows. Section 2 introduces the basics of typed 
(hyper) graph transformation systems according to the Single-Puslrout Approach. 
Section 3 presents the fragment of Java that we shall consider, and Section 4 
shows how to translate programs written in this fragment into typed GTSs. 
Section 5 discusses some design choices underlying the proposed translation and 
how do we intend to handle other features of Java, and Section 6 concludes 
sketching some subjects of future work. 

2 Typed Graph Transformation Systems 

In this section we recall the definition of graph transformation systems (GTSs) 
according to the Single-Puslrout (SPO) approach [18,8]. However, it is worth 
stressing that we could have used equivalently other approaches, like for example 
the Double-Puslrout approach: in fact, for the kind of graphs and rules generated 
by the translation of Java to GTSs, the two approaches are equivalent. 

We shall define GTSs over (typed) /ij/pergraplrs, i.e. , graphs where edges can 
be connected to any (finite) number of vertices. Graphically, an edge is depicted 
as a box (whose shape may vary) , and the connections to the vertices are drawn 
as thin lines, called tentacles. Usually the tentacles of an edge are labeled by 
natural numbers: here we propose a slightly more general definition, allowing us 
to label tentacles with labels taken from a fixed set (see Figure 2). 

The definition of the SPO approach is based on a category of graphs and 
partial morphisms. 

Definition 1 (weak commutativity). Given two partial functions f,f':A — > 
B, we say that f is less defined than f (and we write f < f) if dom(f) C 
dom(f') and f{x ) = f'(x) for all x £ dom(f). Given two partial functions 
f : A — » B and f':A'—*B', and two total functions a : A — > A! and b : B — > B ' , 
we say that the resulting diagram commutes weakly if b o f < f o a. 

Now we introduce lrypergraphs and partial morphisms. Each lryperedge is 
associated to a finite, labeled set of vertices (formally, to a partial function from 
a fixed set of labels to vertices) . This definition makes use of a labeling functor, 
defined as follows: Let L be a set of labels. The labeling functor Xl : Set p — » 
Set p maps each set V € \Set p \ to the set of L-labeled sets over V (i.e., to 
the set of partial functions L — > V), and each partial function / : V — » V' £ 




